Removing unneeded self joins
Hi hackers,
There is a join optimization we don't do -- removing inner join of a
table with itself on a unique column. Such joins are generated by
various ORMs, so from time to time our customers ask us to look into
this. Most recently, it was discussed on the list in relation to an
article comparing the optimizations that some DBMS make [1]/messages/by-id/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com.
I started to explore what can be done about this. Attached is a proof of
concept patch. It works for some simple cases:
create table tt(a int primary key, b text);
explain select p.* from tt p join (select * from tt where b ~~ 'a%') q
on p.a = q.a;
QUERY PLAN
──────────────────────────────────────────────────────
Seq Scan on tt p (cost=0.00..25.88 rows=6 width=36)
Filter: (b ~~ 'a%'::text)
It also works for semi-joins like `explain select p.* from tt p where
exists (select * from tt where b ~~ 'a%' and a = p.a);`. This requires a
preparatory step of reducing unique semi joins to inner joins, and we
already do this (reduce_unique_semijoin).
What this patch tries to do is to remove these inner joins when a single
join is being planned (populate_joinrel_with_paths). The main entry
point is reduce_self_unique_join. First, it proves that both input
relations are uniquely constrained by the same index given the
particular join clauses. We already have a way to find such indexes
(relation_has_unique_index_for), so I was able to reuse this. What I'm
not sure about is how to properly remove the join after that. For now, I
just pretend that the join relation being built is the outer baserel,
add to it the restrictions from the inner relation, and then plan it as
usual. Maybe there is a less hacky way to do it? I've seen elsewhere a
suggestion to use an AppendPath for a similar purpose, but here we can't
just use the outer relation we've already planned because the
restriction list is different.
I'd be glad to hear your thoughts on this.
[1]: /messages/by-id/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com
/messages/by-id/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
remove-self-join-v1.patchtext/x-patch; name=remove-self-join-v1.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 477b9f7..334793c 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -76,13 +76,9 @@ static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
-static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
- RangeTblEntry *rte);
static void create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel);
static void set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
-static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
- RangeTblEntry *rte);
static void set_tablesample_rel_size(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte);
static void set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -366,7 +362,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
else
{
/* Plain relation */
- set_plain_rel_size(root, rel, rte);
+ set_plain_rel_size(root, rel);
}
break;
case RTE_SUBQUERY:
@@ -449,7 +445,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
else
{
/* Plain relation */
- set_plain_rel_pathlist(root, rel, rte);
+ set_plain_rel_pathlist(root, rel);
}
break;
case RTE_SUBQUERY:
@@ -516,8 +512,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
* set_plain_rel_size
* Set size estimates for a plain relation (no subquery, no inheritance)
*/
-static void
-set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+void
+set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel)
{
/*
* Test any partial indexes of rel for applicability. We must do this
@@ -691,8 +687,8 @@ set_rel_consider_parallel(PlannerInfo *root, RelOptInfo *rel,
* set_plain_rel_pathlist
* Build access paths for a plain relation (no subquery, no inheritance)
*/
-static void
-set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
+void
+set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
{
Relids required_outer;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..d79b3c7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2961,7 +2961,7 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
}
/*
- * relation_has_unique_index_for
+ * relation_get_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
* columns of some unique index.
@@ -2982,8 +2982,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* this routine automatically adds in any usable baserestrictinfo clauses.
* (Note that the passed-in restrictlist will be destructively modified!)
*/
-bool
-relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
+IndexOptInfo *
+relation_get_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
{
@@ -2993,7 +2993,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
- return false;
+ return NULL;
/*
* Examine the rel's restriction clauses for usable var = const clauses
@@ -3034,7 +3034,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Short-circuit the easy case */
if (restrictlist == NIL && exprlist == NIL)
- return false;
+ return NULL;
/* Examine each index of the relation ... */
foreach(ic, rel->indexlist)
@@ -3131,10 +3131,10 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
- return true;
+ return ind;
}
- return false;
+ return NULL;
}
/*
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7008e13..d57ac82 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -15,11 +15,15 @@
#include "postgres.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
+#include "optimizer/plancat.h"
+#include "optimizer/predtest.h"
#include "optimizer/prep.h"
+#include "optimizer/restrictinfo.h"
#include "partitioning/partbounds.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -747,6 +751,240 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+static bool
+has_relation_reference_walker(Node *node, void *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ return ((Var *) node)->varno == (Index) (intptr_t) context;
+
+ return expression_tree_walker(node, has_relation_reference_walker, context);
+}
+
+static bool
+has_relation_reference(PathTarget *target, Index relid)
+{
+ return has_relation_reference_walker((Node *) target->exprs,
+ (void *) (intptr_t) relid);
+}
+
+static bool
+set_varno_walker(Node *node, void *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ ((Var *) node)->varno = (Index) (intptr_t) context;
+
+ return expression_tree_walker(node, set_varno_walker, context);
+}
+
+static void
+set_varno(Expr *target, Index relid)
+{
+ set_varno_walker((Node *) target, (void*) (intptr_t) relid);
+}
+
+static inline bool
+clause_sides_match_join(RestrictInfo *rinfo, RelOptInfo *outerrel,
+ RelOptInfo *innerrel)
+{
+ if (bms_is_subset(rinfo->left_relids, outerrel->relids) &&
+ bms_is_subset(rinfo->right_relids, innerrel->relids))
+ {
+ /* lefthand side is outer */
+ rinfo->outer_is_left = true;
+ return true;
+ }
+ else if (bms_is_subset(rinfo->left_relids, innerrel->relids) &&
+ bms_is_subset(rinfo->right_relids, outerrel->relids))
+ {
+ /* righthand side is outer */
+ rinfo->outer_is_left = false;
+ return true;
+ }
+ return false; /* no good for these input relations */
+}
+
+static void
+init_simple_rel(PlannerInfo *root, RelOptInfo *rel, Index relid)
+{
+ RangeTblEntry *rte;
+
+ /* Fetch RTE for relation */
+ rte = root->simple_rte_array[relid];
+ Assert(rte != NULL);
+
+ rel->reloptkind = RELOPT_BASEREL;
+ rel->rtekind = rte->rtekind;
+ rel->rows = 0;
+ rel->relid = relid;
+
+ /* min_attr, max_attr, attr_needed, attr_widths are set below */
+
+ /* Check type of rtable entry */
+ switch (rte->rtekind)
+ {
+ case RTE_RELATION:
+ /* Table --- retrieve statistics from the system catalogs */
+ get_relation_info(root, rte->relid, rte->inh, rel);
+ break;
+ case RTE_SUBQUERY:
+ case RTE_FUNCTION:
+ case RTE_TABLEFUNC:
+ case RTE_VALUES:
+ case RTE_CTE:
+ case RTE_NAMEDTUPLESTORE:
+
+ /*
+ * Subquery, function, tablefunc, values list, CTE, or ENR --- set
+ * up attr range and arrays
+ *
+ * Note: 0 is included in range to support whole-row Vars
+ */
+ rel->min_attr = 0;
+ rel->max_attr = list_length(rte->eref->colnames);
+ rel->attr_needed = (Relids *)
+ palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(Relids));
+ rel->attr_widths = (int32 *)
+ palloc0((rel->max_attr - rel->min_attr + 1) * sizeof(int32));
+ break;
+ default:
+ elog(ERROR, "unrecognized RTE kind: %d",
+ (int) rte->rtekind);
+ break;
+ }
+}
+
+static IndexOptInfo *
+get_inner_unique_index(PlannerInfo *root, RelOptInfo *innerRel,
+ RelOptInfo *outerRel, List *restrictlist)
+{
+ ListCell *lc;
+ List *rl = list_copy(restrictlist);
+ foreach(lc, rl)
+ {
+ if (!clause_sides_match_join((RestrictInfo *) lfirst(lc),
+ outerRel, innerRel))
+ return NULL;
+ }
+
+ return relation_get_unique_index_for(root, innerRel, rl, 0, 0);
+}
+
+/*
+ * This function optimizes out an inner join of a relation to itself on a
+ * unique column. When the inner relation is not referenced by the targetlist,
+ * we can transfer the filters from the inner relation to the outer one, and
+ * scan it instead of joining.
+ */
+static bool
+reduce_self_unique_join(PlannerInfo *root, RelOptInfo *outerrel,
+ RelOptInfo *innerrel, RelOptInfo *joinrel,
+ List *restrictlist)
+{
+ List *mergedRinfos;
+ ListCell *lc;
+ IndexOptInfo *outeridx, *inneridx;
+ Bitmapset *oldRelids = joinrel->relids;
+
+ outeridx = get_inner_unique_index(root, outerrel, innerrel, restrictlist);
+ if (!outeridx)
+ return false;
+
+ /*
+ * Join can be reduced only if the unique constraint is immediate.
+ * Otherwise, the constraint does not always hold inside a transaction.
+ */
+ if (!outeridx->immediate)
+ return false;
+
+ inneridx = get_inner_unique_index(root, innerrel, outerrel, restrictlist);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->indexoid != inneridx->indexoid)
+ return false;
+
+ /* A sanity check: this is the same index on the same relation. */
+ Assert(root->simple_rte_array[outerrel->relid]->relid
+ == root->simple_rte_array[innerrel->relid]->relid);
+
+ /*
+ * If some columns of the inner relation are in the target list,
+ * we have to keep it.
+ */
+ if (has_relation_reference(joinrel->reltarget, innerrel->relid))
+ return false;
+
+ /*
+ * All the necessary conditions hold and we can change the join
+ * into a scan on the outer relation. First, append the filters
+ * from inner relation to the outer.
+ */
+ mergedRinfos = list_copy(outerrel->baserestrictinfo);
+ foreach(lc, innerrel->baserestrictinfo)
+ {
+ RestrictInfo *oldRinfo = lfirst_node(RestrictInfo, lc);
+ RestrictInfo *newRinfo;
+ Expr *newClause;
+
+ if (is_redundant_derived_clause(oldRinfo, mergedRinfos))
+ continue; /* derived from same EquivalenceClass */
+
+ newClause = copyObject(oldRinfo->clause);
+ set_varno(newClause, outerrel->relid);
+
+ /*
+ * !!!FIXME This check doesn't work, because predicate_implied_by
+ * uses equal() to compare Var nodes. For the Vars transfered from
+ * another relation, some parameters such as varoattno and location
+ * are different. They are not meaningful for the purposes of
+ * Maybe just don't compare them?
+ */
+ if (!contain_mutable_functions((Node *) newClause)
+ && predicate_implied_by(list_make1(newClause), mergedRinfos,
+ /* weak = */ false ))
+ continue; /* provably implied by r1 */
+
+ /*
+ * Make a new rinfo instead of copying, because it has quite a few
+ * internal fields that depend on relid.
+ */
+ newRinfo = make_restrictinfo(newClause,
+ oldRinfo->is_pushed_down,
+ oldRinfo->outerjoin_delayed,
+ oldRinfo->pseudoconstant,
+ oldRinfo->security_level,
+ bms_make_singleton(outerrel->relid) /* required relids */,
+ NULL /* outer relids */,
+ oldRinfo->nullable_relids);
+ mergedRinfos = lappend(mergedRinfos, newRinfo);
+ }
+
+ /*
+ * Pretend that the join relation is a base relation, and
+ * plan it as usual.
+ */
+ init_simple_rel(root, joinrel, outerrel->relid);
+
+ joinrel->baserestrictinfo = mergedRinfos;
+ joinrel->relids = bms_make_singleton(joinrel->relid);
+
+ set_plain_rel_size(root, joinrel);
+ set_plain_rel_pathlist(root, joinrel);
+
+ joinrel->relids = oldRelids;
+
+ elog(LOG, "Removed self join.");
+
+ return true;
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -786,6 +1024,10 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
mark_dummy_rel(joinrel);
break;
}
+
+ if (reduce_self_unique_join(root, rel1, rel2, joinrel, restrictlist))
+ break;
+
add_paths_to_joinrel(root, joinrel, rel1, rel2,
JOIN_INNER, sjinfo,
restrictlist);
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..128af10 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -601,7 +601,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* reference to unique indexes. Make sure there's at least one
* suitable unique index. It must be immediately enforced, and if
* it's a partial index, it must match the query. (Keep these
- * conditions in sync with relation_has_unique_index_for!)
+ * conditions in sync with relation_get_unique_index_for!)
*/
ListCell *lc;
@@ -658,10 +658,10 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_get_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_get_unique_index_for(root, rel, clause_list, NIL, NIL))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 0317763..bc9741a 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4593,9 +4593,17 @@ fix_indexqual_references(PlannerInfo *root, IndexPath *index_path)
* Check to see if the indexkey is on the right; if so, commute
* the clause. The indexkey should be the side that refers to
* (only) the base relation.
+ *
+ * index->rel->relids can be a multi-bit set, so the check
+ * cannot be simplified to a plain comparison. The multi-bit
+ * relids are produced by the unique inner join removal, see
+ * reduce_self_unique_join().
*/
- if (!bms_equal(rinfo->left_relids, index->rel->relids))
+ if (!bms_is_empty(rinfo->right_relids)
+ && bms_is_subset(rinfo->right_relids, index->rel->relids))
+ {
CommuteOpExpr(op);
+ }
/*
* Now replace the indexkey expression with an index Var.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e190ad4..c65474f 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1583,7 +1583,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
* clauses for the rel, as well.
*/
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
+ relation_get_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
sjinfo->semi_operators))
{
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..d2f8bcb 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -50,6 +50,8 @@ extern PGDLLIMPORT join_search_hook_type join_search_hook;
extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
extern void set_dummy_rel_pathlist(RelOptInfo *rel);
+extern void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel);
+extern void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
List *initial_rels);
@@ -71,7 +73,7 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
-extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
+extern IndexOptInfo *relation_get_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cbc882d..7f3691f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2918,6 +2918,65 @@ where nt3.id = 1 and ss2.b3;
(1 row)
--
+-- test removal of semi or inner joins on unique columns
+-- !!!FIXME check coverage
+--
+analyze nt1;
+explain (costs off)
+select a.* from nt1 a
+join (select * from nt1 where a1) b on a.id = b.id;
+ QUERY PLAN
+-------------------
+ Seq Scan on nt1 a
+ Filter: a1
+(2 rows)
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id) and id < 2;
+ QUERY PLAN
+--------------------
+ Seq Scan on nt1 a
+ Filter: (id < 2)
+(2 rows)
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id and id < 2);
+ QUERY PLAN
+--------------------
+ Seq Scan on nt1 a
+ Filter: (id < 2)
+(2 rows)
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id) and a1;
+ QUERY PLAN
+-------------------
+ Seq Scan on nt1 a
+ Filter: a1
+(2 rows)
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id and a1);
+ QUERY PLAN
+-------------------
+ Seq Scan on nt1 a
+ Filter: a1
+(2 rows)
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id and a1) and id < 2;
+ QUERY PLAN
+-----------------------------
+ Seq Scan on nt1 a
+ Filter: (a1 AND (id < 2))
+(2 rows)
+
+--
-- test case where a PlaceHolderVar is propagated into a subquery
--
explain (costs off)
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 86c6d5b..1fc6a02 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -851,6 +851,36 @@ from nt3 as nt3
where nt3.id = 1 and ss2.b3;
--
+-- test removal of semi or inner joins on unique columns
+-- !!!FIXME check coverage
+--
+analyze nt1;
+
+explain (costs off)
+select a.* from nt1 a
+join (select * from nt1 where a1) b on a.id = b.id;
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id) and id < 2;
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id and id < 2);
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id) and a1;
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id and a1);
+
+explain (costs off)
+select a.* from nt1 a
+where exists (select * from nt1 where id = a.id and a1) and id < 2;
+
+--
-- test case where a PlaceHolderVar is propagated into a subquery
--
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
There is a join optimization we don't do -- removing inner join of a
table with itself on a unique column. Such joins are generated by
various ORMs, so from time to time our customers ask us to look into
this. Most recently, it was discussed on the list in relation to an
article comparing the optimizations that some DBMS make [1].
This is the sort of thing that I always wonder why the customers don't
ask the ORM to stop generating such damfool queries. Its *expensive*
for us to clean up after their stupidity; almost certainly, it would
take far fewer cycles, net, for them to be a bit smarter in the first
place.
regards, tom lane
On Wed, May 16, 2018 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
There is a join optimization we don't do -- removing inner join of a
table with itself on a unique column. Such joins are generated by
various ORMs, so from time to time our customers ask us to look into
this. Most recently, it was discussed on the list in relation to an
article comparing the optimizations that some DBMS make [1].This is the sort of thing that I always wonder why the customers don't
ask the ORM to stop generating such damfool queries. Its *expensive*
for us to clean up after their stupidity; almost certainly, it would
take far fewer cycles, net, for them to be a bit smarter in the first
place.
The trouble, of course, is that the customer didn't write the ORM,
likely has no idea how it works, and doesn't want to run a modified
version of it even if they do. If the queries run faster on other
systems than they do on PostgreSQL, we get dinged -- not unjustly.
Also, I'm not sure that I believe that it's always easy to avoid
generating such queries. I mean, this case is trivial so it's easy to
say, well, just rewrite the query. But suppose that I have a fact
table over which I've created two views, each of which performs
various joins between the fact table and various lookup tables. My
queries are such that I normally need the joins in just one of these
two views and not the other to fetch the information I care about.
But every once in a while I need to run a report that involves pulling
every column possible. The obvious solution is to join the views on
the underlying table's primary key, but then you get this problem. Of
course there's a workaround: define a third view that does both sets
of joins-to-lookup-tables. But that starts to feel like you're
handholding the database; surely it's the database's job to optimize
queries, not the user's.
It's been about 10 years since I worked as a web developer, but I do
remember hitting this kind of problem from time to time and I'd really
like to see us do something about it. I wish we could optimize away
inner joins, too, for similar reasons.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2018-05-16 12:26:48 -0400, Robert Haas wrote:
Also, I'm not sure that I believe that it's always easy to avoid
generating such queries.
Yea. There's obviously plenty cases where ORMs just want to make the
database hurt. But especially when building a join between a number of
tables based on various fields, it's not going to be easy for the ORM to
figure out which ones can be safely omitted. It'd need similar
optimization as we'd have to do, without having the infrastructure core
PG has. And then there's, as you say, views etc...
Greetings,
Andres Freund
On 16 May 2018 at 11:26, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, May 16, 2018 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
There is a join optimization we don't do -- removing inner join of a
table with itself on a unique column. Such joins are generated by
various ORMs, so from time to time our customers ask us to look into
this. Most recently, it was discussed on the list in relation to an
article comparing the optimizations that some DBMS make [1].This is the sort of thing that I always wonder why the customers don't
ask the ORM to stop generating such damfool queries. Its *expensive*
for us to clean up after their stupidity; almost certainly, it would
take far fewer cycles, net, for them to be a bit smarter in the first
place.The trouble, of course, is that the customer didn't write the ORM,
likely has no idea how it works, and doesn't want to run a modified
version of it even if they do. If the queries run faster on other
systems than they do on PostgreSQL, we get dinged -- not unjustly.Also, I'm not sure that I believe that it's always easy to avoid
generating such queries. I mean, this case is trivial so it's easy to
say, well, just rewrite the query. But suppose that I have a fact
table over which I've created two views, each of which performs
various joins between the fact table and various lookup tables. My
queries are such that I normally need the joins in just one of these
two views and not the other to fetch the information I care about.
But every once in a while I need to run a report that involves pulling
every column possible. The obvious solution is to join the views on
the underlying table's primary key, but then you get this problem. Of
course there's a workaround: define a third view that does both sets
of joins-to-lookup-tables. But that starts to feel like you're
handholding the database; surely it's the database's job to optimize
queries, not the user's.It's been about 10 years since I worked as a web developer, but I do
remember hitting this kind of problem from time to time and I'd really
like to see us do something about it. I wish we could optimize away
inner joins, too, for similar reasons.
I agree with everything you say.
What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.
More advanced optimizations should only be applied when we've assessed
that the likely run time is high enough to make it worth investing in
further optimization.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes:
What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.
Interesting. The concern I had was more about the cost imposed on every
query to detect self-joins and try to prove them useless, even in queries
where no benefit ensues. It's possible that we can get that down to the
point where it's negligible; but this says that even the successful-proof
case has to be very cheap.
regards, tom lane
On May 16, 2018, at 1:58 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2018-05-16 12:26:48 -0400, Robert Haas wrote:
Also, I'm not sure that I believe that it's always easy to avoid
generating such queries.Yea. There's obviously plenty cases where ORMs just want to make the
database hurt. But especially when building a join between a number of
tables based on various fields, it's not going to be easy for the ORM to
figure out which ones can be safely omitted. It'd need similar
optimization as we'd have to do, without having the infrastructure core
PG has. And then there's, as you say, views etc…
Are there specific examples of what the ORM code is that generated
the SQL? I’m more curious to see what people are writing that
generates such code. As earlier mentioned we could always report back
to the specific ORM maintainer(s) such examples and see if they could
tweak.
Jonathan
On 16 May 2018 at 15:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.Interesting. The concern I had was more about the cost imposed on every
query to detect self-joins and try to prove them useless, even in queries
where no benefit ensues. It's possible that we can get that down to the
point where it's negligible; but this says that even the successful-proof
case has to be very cheap.
What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.
For advanced optimizations that are costly to check for, skip the
check if we are already below a cost threshold. The threshold would be
a heuristic that varies according to the cost of the check.
I realise that in this case we wouldn't know the full query cost until
we've done join planning, so we would need some lower bound estimate
to check whether its worth trying to remove joins.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17 May 2018 at 03:43, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
I'd be glad to hear your thoughts on this.
(I only glanced at the patch)
I've thought and discussed this before on this list. I think the
arguments for and against it were much the same as you've received
already. If you trawl through the archives you'll see my argument for
matches quite closely to Robert regarding the nested-views. I
personally experienced this issue in my previous job, although it was
not with PostgreSQL.
I think it's worth doing this providing that we can fast-path out
quickly enough in cases where we can't possibly remove anything.
Likely the success of this patch depends on how quick that fast-path
is.
From my experience on join removals, I imagine all this can be done
just after the left join removal code has completed. I see your patch
does it much later, which I don't think is particularly great since
Paths have already been generated by that time. I think it makes sense
to do this as early as possible to save wasting planning work for
relations that will be removed.
I think all this can be done just after left joins are removed by
remove_useless_joins. You may like to move the code that exists in
that function today into a new static function named
remove_useless_left_joins, and put this new code in new static
function named remove_useless_self_joins:
1. Allocate an array root->simple_rel_array_size in size. Populate it
with a struct which is defined as struct { Index relid; Oid oid; }
2. Populate that array by looping over the simple_rel_array. Ignore
anything that's not a baserel with relkind = 'r'
3. qsort the array on Oid.
4. Make a pass over the array (up to its size - 1) looking for
elements where the current oid is the same as the next. Build a List
of RelIds containing all relids of Oids which are duplicated.
5. If no pairs. Abort.
6. Process each combination of pairs found in each Relids in the list
made in step 1. Probably start at the lowest relid.
7. For each pair:
a. If there's a join condition, ensure all join OpExprs are equality
exprs with a mergejoinable opno (copy what left join removal check
with the opno used). Ensure Vars used in the OpExpr have the same
attrno on each side.
b. For bonus points don't reject non-Vars used in the join
condition, but ensure they're equal and there are no non-immutable
functions inside.
c. Ensure relation_has_unique_index_for returns true for the Vars
(and Exprs if doing b) used in the join condition.
d. Choose the relation with the highest relid and rewrite the parse
changing the varno of all Vars to use the one of the relation with the
lowest relid.
e. list_concat baserestictinfos from removed relation onto other relation.
f. Check for Vars in the join condition that can contain NULLs and
lappend IS NOT NULLs into the baserestrictinfo. (Removing the join
could have removed NULL filtering)
g. Mark highest relid relation as DEAD. (See what the left join
removal code does (you may need to do some extra work around
equivalence classes))
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 17 May 2018 at 08:44, Simon Riggs <simon@2ndquadrant.com> wrote:
What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.
That's tricky. If we do this, it should be done before Path
generation, so not much is known about the costs in those case.
Perhaps something can be done by looking at the number of relpages,
but I've no idea what that would be. Perhaps we need to see how costly
this operation is first before we try to think of ways to only apply
it conditionally?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
David,
Many thanks for the detailed explanation. I'll try to code it up and
measure how much overhead it introduces.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Thu, May 17, 2018 at 3:43 AM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
There is a join optimization we don't do -- removing inner join of a table
with itself on a unique column. Such joins are generated by various ORMs, so
from time to time our customers ask us to look into this. Most recently, it
was discussed on the list in relation to an article comparing the
optimizations that some DBMS make [1]....
I'd be glad to hear your thoughts on this.
+1
Some thoughts:
There might be some interesting corner cases involving self-joins in
UPDATE/DELETE statements, and also FOR UPDATE etc. Those can result
in some surprising results in a self-join (one side is subject to EPQ
and the other isn't) which I think might be changed by your patch
(though I didn't try it or read the patch very closely).
IIUC in DB2 (the clear winner at join elimination in the article you
mentioned), you get these sorts of things by default (optimisation
level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
3 as many articles recommend for OLTP work. I think it's interesting
that they provide that knob rather than something automatic, and
interesting that there is one linear knob to classify your workload
rather than N knobs for N optimisations.
--
Thomas Munro
http://www.enterprisedb.com
HI,
On 2018-05-17 08:48:58 +1200, David Rowley wrote:
On 17 May 2018 at 08:44, Simon Riggs <simon@2ndquadrant.com> wrote:
What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.That's tricky. If we do this, it should be done before Path
generation, so not much is known about the costs in those case.Perhaps something can be done by looking at the number of relpages,
but I've no idea what that would be. Perhaps we need to see how costly
this operation is first before we try to think of ways to only apply
it conditionally?
I'm also not buying that this isn't a benefit in OLTP in general. Sure,
for a single query RTT costs are going to dominate, but if you use
prepared statements the costs are going to pay of over multiple
executions. Even just avoiding initializing unnecessary executor nodes
shows up in profiles.
Greetings,
Andres Freund
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 08:44, Simon Riggs <simon@2ndquadrant.com> wrote:
What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.
That's tricky. If we do this, it should be done before Path
generation, so not much is known about the costs in those case.
Yeah. It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable. (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)
regards, tom lane
On 17 May 2018 at 10:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah. It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable. (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)
There's probably some argument for delaying obtaining the relation
size until after join removal and probably partition pruning too, but
it's currently done well before that in build_simple_rel, where the
RelOptInfo is built.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Thomas Munro <thomas.munro@enterprisedb.com> writes:
IIUC in DB2 (the clear winner at join elimination in the article you
mentioned), you get these sorts of things by default (optimisation
level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
3 as many articles recommend for OLTP work. I think it's interesting
that they provide that knob rather than something automatic, and
interesting that there is one linear knob to classify your workload
rather than N knobs for N optimisations.
There's a lot to be said for that type of approach, as opposed to trying
to drive it off some necessarily-very-inexact preliminary estimate of
query cost. For example, the mere fact that you're joining giant tables
doesn't in itself suggest that extra efforts in query optimization will be
repaid. (If anything, it seems more likely that the user would've avoided
silliness like useless self-joins in such a case.)
A different line of thought is that, to me, the most intellectually
defensible rationale for efforts like const-simplification and join
removal is that opportunities for those things can arise after view
expansion, even in queries where the original query text didn't seem
to contain anything extraneous. (Robert and Andres alluded to this
upthread, but not very clearly.) So maybe we could track how much
the query got changed during rewriting, and use that to drive the
planner's decisions about how hard to work later on. But I'm not
very sure that this'd be superior to having a user-visible knob.
regards, tom lane
On 2018-05-16 18:37:11 -0400, Tom Lane wrote:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
IIUC in DB2 (the clear winner at join elimination in the article you
mentioned), you get these sorts of things by default (optimisation
level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
3 as many articles recommend for OLTP work. I think it's interesting
that they provide that knob rather than something automatic, and
interesting that there is one linear knob to classify your workload
rather than N knobs for N optimisations.There's a lot to be said for that type of approach, as opposed to trying
to drive it off some necessarily-very-inexact preliminary estimate of
query cost. For example, the mere fact that you're joining giant tables
doesn't in itself suggest that extra efforts in query optimization will be
repaid. (If anything, it seems more likely that the user would've avoided
silliness like useless self-joins in such a case.)
For prepared statements we could also start making more expensive
optimizations after the first execution, when we know how long the query
took / how expensive it was (also, if we had a plan cache...).
Greetings,
Andres Freund
On 17 May 2018 at 10:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
IIUC in DB2 (the clear winner at join elimination in the article you
mentioned), you get these sorts of things by default (optimisation
level 5 includes it), but not if you SET CURRENT QUERY OPTIMIZATION =
3 as many articles recommend for OLTP work. I think it's interesting
that they provide that knob rather than something automatic, and
interesting that there is one linear knob to classify your workload
rather than N knobs for N optimisations.There's a lot to be said for that type of approach, as opposed to trying
to drive it off some necessarily-very-inexact preliminary estimate of
query cost. For example, the mere fact that you're joining giant tables
doesn't in itself suggest that extra efforts in query optimization will be
repaid. (If anything, it seems more likely that the user would've avoided
silliness like useless self-joins in such a case.)A different line of thought is that, to me, the most intellectually
defensible rationale for efforts like const-simplification and join
removal is that opportunities for those things can arise after view
expansion, even in queries where the original query text didn't seem
to contain anything extraneous. (Robert and Andres alluded to this
upthread, but not very clearly.) So maybe we could track how much
the query got changed during rewriting, and use that to drive the
planner's decisions about how hard to work later on. But I'm not
very sure that this'd be superior to having a user-visible knob.
This seems like a good line of thought. Perhaps a knob is a good
first step, then maybe having the ability to set that knob to
"automatic" is something to aspire for later.
I don't think Alexander should work on this as part of this patch
though. Perhaps we can re-evaluate when Alexander posts some planner
benchmarks from the patch.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 10:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah. It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable. (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)
There's probably some argument for delaying obtaining the relation
size until after join removal and probably partition pruning too, but
it's currently done well before that in build_simple_rel, where the
RelOptInfo is built.
Yeah, but that's something we ought to fix someday; IMO it's an artifact
of having wedged in remove_useless_joins without doing the extensive
refactoring that'd be needed to do it at a more desirable time. I don't
want to build user-visible behavior that's dependent on doing that wrong.
(But wait a second ... we could improve this without quite that much work:
instead of doing estimate_rel_size immediately during get_relation_info,
couldn't it be left until the set_base_rel_sizes pass? Since
RelationGetNumberOfBlocks involves kernel calls, skipping it for removed
rels seems worth doing.)
regards, tom lane
On 2018-05-16 18:55:41 -0400, Tom Lane wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 10:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah. It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable. (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)There's probably some argument for delaying obtaining the relation
size until after join removal and probably partition pruning too, but
it's currently done well before that in build_simple_rel, where the
RelOptInfo is built.Yeah, but that's something we ought to fix someday; IMO it's an artifact
of having wedged in remove_useless_joins without doing the extensive
refactoring that'd be needed to do it at a more desirable time. I don't
want to build user-visible behavior that's dependent on doing that wrong.
My patch that introduced a radix tree buffer mapping also keeps an
accurate relation size in memory, making it far cheaper to use. While I
depriorized the patchset for the moment (I'll post what I'm working on
first soon), that should address some of the cost till then.
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.
Greetings,
Andres Freund
On 17 May 2018 at 10:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
There's probably some argument for delaying obtaining the relation
size until after join removal and probably partition pruning too, but
it's currently done well before that in build_simple_rel, where the
RelOptInfo is built.Yeah, but that's something we ought to fix someday; IMO it's an artifact
of having wedged in remove_useless_joins without doing the extensive
refactoring that'd be needed to do it at a more desirable time. I don't
want to build user-visible behavior that's dependent on doing that wrong.(But wait a second ... we could improve this without quite that much work:
instead of doing estimate_rel_size immediately during get_relation_info,
couldn't it be left until the set_base_rel_sizes pass? Since
RelationGetNumberOfBlocks involves kernel calls, skipping it for removed
rels seems worth doing.)
I did mean just obtaining the sizes, not delaying building the
RelOptInfo. I see nothing that needs RelOptInfo->pages before
set_base_rel_size apart from the code which I mentioned about moving a
couple of days ago in [1]/messages/by-id/CAKJS1f_eUz0_h5_vU1rqE7wuxMcoENcWK2FTODz0pOyxp3_Uig@mail.gmail.com.
[1]: /messages/by-id/CAKJS1f_eUz0_h5_vU1rqE7wuxMcoENcWK2FTODz0pOyxp3_Uig@mail.gmail.com
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 17 May 2018 at 11:00, Andres Freund <andres@anarazel.de> wrote:
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.
While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value. We could have uses pg_class.relpages all
along, but it would cause the planner to not work so well in face of
the relation changing size significantly between analyze runs.
FWIW the major case where that does show up is when generating a plan
for a partitioned table with many partitions then pruning all but a
few of them.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 11:00, Andres Freund <andres@anarazel.de> wrote:
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.
While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.
Yeah, that scares me too. We'd then be in a situation where (arguably)
any relation extension should force a relcache inval. Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.
I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too. AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept. Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.
regards, tom lane
On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 11:00, Andres Freund <andres@anarazel.de> wrote:
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.Yeah, that scares me too. We'd then be in a situation where (arguably)
any relation extension should force a relcache inval. Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.
I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently
later.
I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too. AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept. Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.
A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles. So I can't see how it'd not improve the contention.
But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.
Greetings,
Andres Freund
On 17.05.2018 05:19, Andres Freund wrote:
On 2018-05-16 22:11:22 -0400, Tom Lane wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 11:00, Andres Freund <andres@anarazel.de> wrote:
Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.While I'm in favour of speeding that up, I think we'd get complaints
if we used a stale value.Yeah, that scares me too. We'd then be in a situation where (arguably)
any relation extension should force a relcache inval. Not good.
I do not buy Andres' argument that the value is noncritical, either ---
particularly during initial population of a table, where the size could
go from zero to something-significant before autoanalyze gets around
to noticing.I don't think every extension needs to force a relcache inval. It'd
instead be perfectly reasonable to define a rule that an inval is
triggered whenever crossing a 10% relation size boundary. Which'll lead
to invalidations for the first few pages, but much less frequently
later.I'm a bit skeptical of the idea of maintaining an accurate relation
size in shared memory, too. AIUI, a lot of the problem we see with
lseek(SEEK_END) has to do with contention inside the kernel for access
to the single-point-of-truth where the file's size is kept. Keeping
our own copy would eliminate kernel-call overhead, which can't hurt,
but it won't improve the contention angle.A syscall is several hundred instructions. An unlocked read - which'll
be be sufficient in many cases, given that the value can quickly be out
of date anyway - is a few cycles. Even with a barrier you're talking a
few dozen cycles. So I can't see how it'd not improve the contention.But the main reason for keeping it in shmem is less the lseek avoidance
- although that's nice, context switches aren't great - but to make
relation extension need far less locking.Greetings,
Andres Freund
I completely agree with Andreas. In my multithreaded Postgres prototype
file description cache (shared by all threads) becomes bottleneck
exactly because of each query execution requires
access to file system (lseek) to provide optimizer estimation of the
relation size, despite to the fact that all database fits in memory.
Well, this is certainly specific of shared descriptor's pool in my
prototype, but the fact the we have to perform lseek at each query
compilation seems to be annoying in any case.
And there is really no problem that cached relation size estimation is
not precise. It really can be invalidated even if relation size is
changed more than some threshold value (1Mb?) or lease time for cached
value is expired.
May be it is reasonable to implement specific invalidation for relation
size esimation, to avoid complete invalidation and reconstruction of
relation description and all dependent objects.
In this case time-based invalidation seems to be the easiest choice to
implement. Repeating lseek each 10 or 1 second seems to have no
noticeable impact on performance and relation size can not dramatically
changed during this time.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Wed, May 16, 2018 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <david.rowley@2ndquadrant.com> writes:
On 17 May 2018 at 08:44, Simon Riggs <simon@2ndquadrant.com> wrote:
What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.That's tricky. If we do this, it should be done before Path
generation, so not much is known about the costs in those case.Yeah. It'd have to be a very heuristic thing that doesn't account
for much beyond the number of relations in the query, and maybe their
sizes --- although I don't think we even know the latter at the
point where join removal would be desirable. (And note that one of
the desirable benefits of join removal is not having to find out the
sizes of removed rels ... so just swapping that around doesn't appeal.)
As I've mentioned before, the problem we're talking about here is also
highly relevant to parallel query. We only want to bother generating
partial paths for queries that are expensive enough to justify
considering parallelism, but we don't know how expensive they are
until we finishing planning. The only way I could think of to tackle
that problem was to drive it off the relation sizes, but the presence
or absence of expensive functions in the target list can wildly change
the point at which parallelism potentially becomes useful. So we end
up sometimes wasting effort generating partial paths that are totally
useless, and at other times failing to generate partial paths that
would have been useful. (Really, I'd like to generate a lot more
partial paths than we do, trying out various numbers of workers, but
that would just make the existing problem worse.)
I have wondered about doing a preliminary pass over the tree where we
try to make a crude estimate of the amount of effort involved, and
then planning for real with that number in hand. But it seems like
there's so little information you could get at that early stage that
it would be hard to decide anything useful on that basis. You've got
to at least have relation sizes, and really you need some estimate of
the result cardinality as well. It seems like we currently can't
figure out cardinality without also computing paths, and I've wondered
if we could split those into two separate phases. But every time I
think about trying to do that I realize that my pain tolerance isn't
that high.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
David,
I tried to implement your suggestions, here are the patches.
The first patch mostly moves some code around without changing
functionality. It modifies innerrel_is_unique to not only cache the fact
that the relation is unique, but also
cache the index that guarantees uniqueness.
The second patch adds the unique self join removal code. It goes along
the lines of your plan. I didn't even have to examine join clauses
because the constraints imposed by innerrel_is_unique are strong enough
to guarantee that when it finds the same unique index for both
relations, the join can be removed. Namely, it requires mergejoinable
equality clauses for all columns of a unique index.
As a simple benchmark, I measured the duration of query_planner and
remove_useless_self_joins with clock_gettime() on the regression tests.
The following table shows average times in microseconds, median over 11
runs. First row is with this patch, and the second row doesn't call
remove_useless_self_joins and just calls clock_gettime to measure its
overhead.
query_planner remove_useless_self_joins
with removal 39.61 0.61
no removal 39.45 0.38
So, on this workload, unique self join removal adds about 0.2 mcs, or
0.6% of total time, to query_planner. I also tried a query that joins 26
relations, remove_useless_self_joins takes about 40 mcs. Still, this
time grows quadratically with number of relations we have to process, so
in the final patch I limit it to join_collapse_limit, which brings the
time down to 15 mcs. This is negligible compared to the total
query_planner time, which for 8 relations is about 3 ms, that is, 3
orders of magnitude higher.
These benchmarks mostly measure the path where we don't remove any
joins. I didn't time the join removal itself, because it wins quite some
time by allowing to plan one relation and one join less.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0002-Remove-unique-self-joins-v2.patchtext/x-patch; name=0002-Remove-unique-self-joins-v2.patchDownload
From ad5f5b59265762b3674068ee3656baec3ec66b70 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Wed, 13 Jun 2018 21:45:32 +0300
Subject: [PATCH 2/2] Remove unique self joins.
---
src/backend/optimizer/plan/analyzejoins.c | 513 ++++++++++++++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/join.out | 38 ++-
src/test/regress/sql/join.sql | 16 +
5 files changed, 569 insertions(+), 4 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c03010c..f593a17 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -1122,3 +1125,513 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list, unique_index);
}
+
+static bool
+set_varno_walker(Node *node, void *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ ((Var *) node)->varno = (Index) (intptr_t) context;
+
+ return expression_tree_walker(node, set_varno_walker, context);
+}
+
+/*
+ * Replace varno with 'relid' for all Vars in the expression.
+ */
+static void
+set_varno(Expr *expr, Index relid)
+{
+ set_varno_walker((Node *) expr, (void*) (intptr_t) relid);
+}
+
+static bool
+references_relation_walker(Node *node, void *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ return ((Var *) node)->varno == (Index) (intptr_t) context;
+
+ return expression_tree_walker(node, references_relation_walker, context);
+}
+
+/*
+ * Check whether the expression has any Vars from the relation identified
+ * by 'relid'. For EquivalenceClass nodes, checks the equivalence members.
+ */
+static bool
+references_relation(Node *node, Index relid)
+{
+ if (IsA(node, EquivalenceClass))
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) node;
+ ListCell *lc;
+ foreach (lc, ec->ec_members)
+ {
+ if (references_relation(
+ (Node *) ((EquivalenceMember *) lfirst(lc))->em_expr,
+ relid))
+ {
+ return true;
+ }
+ }
+ return false;
+ }
+ return references_relation_walker(node, (void *) (intptr_t) relid);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *prev, *cell, *next;
+ List *toAppend;
+
+ /*
+ * Join clauses become restriction clauses, when Vars of the relation we
+ * remove are replaced with corresponding Vars of the one we keep.
+ * Restriction clauses from the removed relation are as well transferred
+ * to the remaining one.
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+
+ /*
+ * Append the filters from the removed relation to the remaining one.
+ */
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *oldRinfo = lfirst_node(RestrictInfo, cell);
+ RestrictInfo *newRinfo = NULL;
+ Expr *newClause = NULL;
+
+ /*
+ * Do not add multiple clauses derived from the same equivalence class.
+ */
+ if (is_redundant_derived_clause(oldRinfo, toKeep->baserestrictinfo))
+ continue;
+
+ /*
+ * Make a copy of the clause and replace the references to the removed
+ * relation with references to the remaining one.
+ */
+ newClause = (Expr *) copyObject(oldRinfo->clause);
+ set_varno(newClause, toKeep->relid);
+
+ /*
+ * After we have replaced the Vars, check that the resulting clause is
+ * not implied by the existing ones.
+ */
+ if (!contain_mutable_functions((Node *) newClause)
+ && predicate_implied_by(list_make1(newClause), toKeep->baserestrictinfo,
+ /* weak = */ false ))
+ continue; /* provably implied by r1 */
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (oldRinfo->mergeopfamilies)
+ {
+ Assert(is_opclause(newClause));
+ Expr *leftOp = (Expr *) get_leftop(newClause);
+ Expr *rightOp = (Expr *) get_rightop(newClause);
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ newClause = (Expr *) test;
+ }
+ }
+
+ /*
+ * Finally, correct the relids of the old rinfo, and replace the clause
+ * with the one we just constructed, and append it to the remaining
+ * relation.
+ */
+ newRinfo = copyObject(oldRinfo);
+ newRinfo->clause = newClause;
+ change_relid(&newRinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&newRinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&newRinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&newRinfo->clause_relids, toRemove->relid, toKeep->relid);
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, newRinfo);
+ }
+
+ /*
+ * Remove the relation from the planner data structures.
+ */
+ remove_rel_from_query(root, toRemove->relid, joinrelids);
+
+ /*
+ * The equivalence classes that reference the removed rel can't also
+ * reference anything other than the remaining rel, or else this
+ * optimization wouldn't work (see rel_used_above_join).
+ *
+ * This means we don't have any ECs that can generate join clauses,
+ * and only have the ECs that can generate restrictions. But the restrictions
+ * have been generated already, and we've just transferred them to the
+ * remaining relation from join clauses and from the restrictions of the
+ * removed relation.
+ *
+ * Therefore, we don't need anymore the ECs that reference the removed
+ * relation, and can delete them.
+ */
+ prev = NULL;
+ cell = NULL;
+ next = list_head(root->eq_classes);
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ if (references_relation(lfirst(cell), toRemove->relid))
+ {
+ root->eq_classes = list_delete_cell(root->eq_classes, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ IndexOptInfo *outeridx = NULL;
+ IndexOptInfo *inneridx = NULL;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->indexoid != inneridx->indexoid)
+ return false;
+
+ /* A sanity check: this is the same index on the same relation. */
+ Assert(root->simple_rte_array[outer->relid]->relid
+ == root->simple_rte_array[inner->relid]->relid);
+
+ return true;
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns IntList of the relids that were removed.
+ */
+static List *
+remove_self_joins_one_group(PlannerInfo *root, Index *relids, int n, UsjScratch *scratch)
+{
+ Relids joinrelids = scratch->joinrelids;
+ List *result = NIL;
+ int i, o;
+
+ if (n < 2)
+ return NIL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer;
+
+ if (relids[o] == 0)
+ /* Already removed. */
+ continue;
+
+ outer = root->simple_rel_array[relids[o]];
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner;
+ List *restrictlist;
+
+ if (relids[i] == 0)
+ /* Already removed. */
+ continue;
+
+ inner = root->simple_rel_array[relids[i]];
+
+ /*
+ * Unique semi and left joins are processed by reduce_unique_semijoins
+ * and remove_useless_left_joins respectively, so we don't have to
+ * deal with them here. For full and anti joins, this optimization does
+ * not apply. Therefore, we do nothing if these relations have
+ * a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /* A relation can be removed if it is not referenced above the join. */
+ if (rel_used_above_join(root, joinrelids, inner))
+ {
+ if (rel_used_above_join(root, joinrelids, outer))
+ /*
+ * Both relations are referenced above the join, can't remove
+ * either one.
+ */
+ continue;
+ else
+ {
+ /* Remove outer. */
+ remove_self_join_rel(root, joinrelids, restrictlist,
+ inner, outer);
+ result = lappend_int(result, relids[o]);
+ relids[o] = 0;
+ break;
+ }
+ }
+ else
+ {
+ /* Remove inner. */
+ remove_self_join_rel(root, joinrelids, restrictlist,
+ outer, inner);
+ result = lappend_int(result, relids[i]);
+ relids[i] = 0;
+ }
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ return root->simple_rte_array[*left]->relid
+ < root->simple_rte_array[*right]->relid;
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ */
+static void
+remove_self_joins_one_level(PlannerInfo *root, List **joinlist, UsjScratch *scratch)
+{
+ ListCell *lc;
+ List *relidsToRemove = NIL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, *joinlist)
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ /*
+ * We only care about base relations from which we select something.
+ */
+ if (root->simple_rte_array[ref->rtindex]->rtekind == RTE_RELATION
+ && root->simple_rte_array[ref->rtindex]->relkind == RELKIND_RELATION
+ && root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ relid_ascending[n++] = ref->rtindex;
+ }
+
+ /*
+ * Limit the number of joins we process to control the quadratic behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(root, &relid_ascending[groupStart],
+ i - groupStart, scratch));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(root, &relid_ascending[groupStart],
+ n - groupStart, scratch));
+
+ /*
+ * Delete the removed relations from joinlist.
+ */
+ foreach(lc, relidsToRemove)
+ {
+ Index indexToRemove = lfirst_int(lc);
+ ListCell *prev = NULL, *next = NULL;
+ ListCell *lc2 = list_head(*joinlist);
+ while (lc2)
+ {
+ next = lnext(lc2);
+ if (castNode(RangeTblRef, lfirst(lc2))->rtindex == indexToRemove)
+ *joinlist = list_delete_cell(*joinlist, lc2, prev);
+ else
+ prev = lc2;
+ lc2 = next;
+ }
+ }
+
+ return;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(PlannerInfo *root, List **joinlist, UsjScratch *scratch)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(root, (List **) &lfirst(lc), scratch);
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(root, joinlist, scratch);
+}
+
+/*
+ * Find out which relations have special joins to which.
+ */
+static void
+find_special_joins(PlannerInfo *root, Relids *special_join_rels)
+{
+ ListCell *lc;
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ special_join_rels[rel->relid] =
+ bms_add_members(special_join_rels[rel->relid], info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ special_join_rels[rel->relid] =
+ bms_add_members(special_join_rels[rel->relid], info->min_lefthand);
+ }
+ }
+}
+
+/*
+ * Find and remove unique self joins in the entire join tree.
+ *
+ * First, we cache some data that will be needed later.
+ * Then, for each jointree level, we group all the participating
+ * base relations by their relation Oid. For every pair of relations in a
+ * group, we try to remove the join they make.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ UsjScratch scratch;
+
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.joinrelids = NULL;
+
+ find_special_joins(root, scratch.special_join_rels);
+ remove_self_joins_recurse(root, joinlist, &scratch);
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 8e4abbe..12c3868 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -199,6 +199,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 4805f74..501bb9a 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -111,6 +111,7 @@ extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache,
IndexOptInfo **unique_index);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index dc6262b..05e9403 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4307,11 +4307,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
- One-Time Filter: false
-(2 rows)
+ One-Time Filter: (false AND false)
+ -> Index Scan using parent_pkey on parent p
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4427,6 +4429,34 @@ select * from
----+----+----+----
(0 rows)
+-- test that semi- or inner self-joins on a unique column are removed
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj p
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj p
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d3ba2a1..7620981 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1527,6 +1527,22 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+-- test that semi- or inner self-joins on a unique column are removed
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
--
-- Test hints given on incorrect column references are useful
--
--
2.7.4
0001-Preparatory-refactoring-v2.patchtext/x-patch; name=0001-Preparatory-refactoring-v2.patchDownload
From 73779316a164bee22c301c5e4d518d4afb92066e Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Wed, 13 Jun 2018 20:56:03 +0300
Subject: [PATCH 1/2] Preparatory refactoring
---
src/backend/nodes/equalfuncs.c | 7 +-
src/backend/optimizer/path/indxpath.c | 16 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 233 +++++++++++++++---------------
src/backend/optimizer/plan/planmain.c | 2 +-
src/backend/optimizer/util/pathnode.c | 2 +-
src/backend/optimizer/util/relnode.c | 26 ++--
src/include/nodes/relation.h | 6 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 2 +-
src/include/optimizer/planmain.h | 5 +-
11 files changed, 157 insertions(+), 153 deletions(-)
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6a971d0..fd02dc8 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -166,9 +166,10 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
- COMPARE_LOCATION_FIELD(location);
+ /*
+ * Parse location and old varno/varattno may differ even when
+ * the variables are logically the same.
+ */
return true;
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..78b05f1 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2961,10 +2961,10 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
}
/*
- * relation_has_unique_index_for
+ * relation_get_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index, and return that index.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -2982,8 +2982,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* this routine automatically adds in any usable baserestrictinfo clauses.
* (Note that the passed-in restrictlist will be destructively modified!)
*/
-bool
-relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
+IndexOptInfo *
+relation_get_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
{
@@ -2993,7 +2993,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
- return false;
+ return NULL;
/*
* Examine the rel's restriction clauses for usable var = const clauses
@@ -3034,7 +3034,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Short-circuit the easy case */
if (restrictlist == NIL && exprlist == NIL)
- return false;
+ return NULL;
/* Examine each index of the relation ... */
foreach(ic, rel->indexlist)
@@ -3131,10 +3131,10 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
- return true;
+ return ind;
}
- return false;
+ return NULL;
}
/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..0c11761 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*unique_index*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*unique_index*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..c03010c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -39,14 +39,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, IndexOptInfo **unique_index);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ IndexOptInfo **unique_index);
/*
@@ -58,7 +59,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -146,55 +147,17 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
}
/*
- * join_is_removable
- * Check whether we need not perform this special join at all, because
- * it will just duplicate its left input.
- *
- * This is true for a left join for which the join condition cannot match
- * more than one inner-side row. (There are other possibly interesting
- * cases, but we don't have the infrastructure to prove them.) We also
- * have to check that the inner side doesn't generate any variables needed
- * above the join.
+ * Check whether any attributes of the relation are used above the
+ * join specified by joinrelids.
*/
static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+rel_used_above_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *rel)
{
- int innerrelid;
- RelOptInfo *innerrel;
- Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
/*
- * Must be a non-delaying left join to a single baserel, else we aren't
- * going to be able to do anything with it.
- */
- if (sjinfo->jointype != JOIN_LEFT ||
- sjinfo->delay_upper_joins)
- return false;
-
- if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
- return false;
-
- innerrel = find_base_rel(root, innerrelid);
-
- /*
- * Before we go to the effort of checking whether any innerrel variables
- * are needed above the join, make a quick check to eliminate cases in
- * which we will surely be unable to prove uniqueness of the innerrel.
- */
- if (!rel_supports_distinctness(root, innerrel))
- return false;
-
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
-
- /*
- * We can't remove the join if any inner-rel attributes are used above the
- * join.
- *
- * Note that this test only detects use of inner-rel attributes in higher
+ * This test only detects use of inner-rel attributes in higher
* join conditions and the target list. There might be such attributes in
* pushed-down conditions at this join, too. We check that case below.
*
@@ -203,101 +166,112 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* theory that the system attributes are somewhat less likely to be wanted
* and should be tested last.
*/
- for (attroff = innerrel->max_attr - innerrel->min_attr;
+ for (attroff = rel->max_attr - rel->min_attr;
attroff >= 0;
attroff--)
{
- if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids))
- return false;
+ if (!bms_is_subset(rel->attr_needed[attroff], joinrelids))
+ return true;
}
/*
- * Similarly check that the inner rel isn't needed by any PlaceHolderVars
+ * Similarly check that the relation isn't needed by any PlaceHolderVars
* that will be used above the join. We only need to fail if such a PHV
- * actually references some inner-rel attributes; but the correct check
+ * actually references some relation attributes; but the correct check
* for that is relatively expensive, so we first check against ph_eval_at,
- * which must mention the inner rel if the PHV uses any inner-rel attrs as
+ * which must mention the relation if the PHV uses any relation attrs as
* non-lateral references. Note that if the PHV's syntactic scope is just
- * the inner rel, we can't drop the rel even if the PHV is variable-free.
+ * the given relation, we can't drop the rel even if the PHV is variable-free.
*/
foreach(l, root->placeholder_list)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- if (bms_overlap(phinfo->ph_lateral, innerrel->relids))
- return false; /* it references innerrel laterally */
+ if (bms_overlap(phinfo->ph_lateral, rel->relids))
+ return true; /* it references this relation laterally */
if (bms_is_subset(phinfo->ph_needed, joinrelids))
continue; /* PHV is not used above the join */
- if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids))
- continue; /* it definitely doesn't reference innerrel */
- if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids))
- return false; /* there isn't any other place to eval PHV */
+ if (!bms_overlap(phinfo->ph_eval_at, rel->relids))
+ continue; /* it definitely doesn't reference this relation */
+ if (bms_is_subset(phinfo->ph_eval_at, rel->relids))
+ return true; /* there isn't any other place to eval PHV */
if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
- innerrel->relids))
- return false; /* it does reference innerrel */
+ rel->relids))
+ return true; /* it does reference this relation */
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
- foreach(l, innerrel->joininfo)
+ foreach(l, rel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(rel->relid, restrictinfo->clause_relids))
+ return true;
+ }
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
- return false;
- continue; /* else, ignore; not useful here */
- }
+ return false;
+}
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
+/*
+ * join_is_removable
+ * Check whether we need not perform this special join at all, because
+ * it will just duplicate its left input.
+ *
+ * This is true for a left join for which the join condition cannot match
+ * more than one inner-side row. (There are other possibly interesting
+ * cases, but we don't have the infrastructure to prove them.) We also
+ * have to check that the inner side doesn't generate any variables needed
+ * above the join.
+ */
+static bool
+join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+{
+ int innerrelid;
+ RelOptInfo *innerrel;
+ Relids joinrelids;
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
- }
+ /*
+ * Must be a non-delaying left join to a single baserel, else we aren't
+ * going to be able to do anything with it.
+ */
+ if (sjinfo->jointype != JOIN_LEFT ||
+ sjinfo->delay_upper_joins)
+ return false;
+
+ if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
+ return false;
+
+ innerrel = find_base_rel(root, innerrelid);
/*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
+ * Before we go to the effort of checking whether any innerrel variables
+ * are needed above the join, make a quick check to eliminate cases in
+ * which we will surely be unable to prove uniqueness of the innerrel.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
+ if (!rel_supports_distinctness(root, innerrel))
+ return false;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
/*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
+ * We can't remove the join if any inner-rel attributes are used above the
+ * join.
*/
- return false;
-}
+ if (rel_used_above_join(root, joinrelids, innerrel))
+ return false;
+
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
+}
/*
* Remove the target relid from the planner's data structures, having
@@ -568,7 +542,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*unique_index*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -601,7 +575,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* reference to unique indexes. Make sure there's at least one
* suitable unique index. It must be immediately enforced, and if
* it's a partial index, it must match the query. (Keep these
- * conditions in sync with relation_has_unique_index_for!)
+ * conditions in sync with relation_get_unique_index_for!)
*/
ListCell *lc;
@@ -643,9 +617,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ IndexOptInfo **unique_index)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -658,11 +636,14 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_get_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ IndexOptInfo *index = relation_get_unique_index_for(root, rel, clause_list,
+ NIL, NIL);
+ if (unique_index)
+ *unique_index = index;
+ return index != NULL;
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +947,9 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If unique_index_out is not null, it is set to point to the index that
+ * guarantees uniqueness of a base relation.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,10 +958,15 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ IndexOptInfo **unique_index_out)
{
MemoryContext old_context;
ListCell *lc;
+ IndexOptInfo *unique_index = NULL;
+
+ if (unique_index_out)
+ *unique_index_out = NULL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -999,10 +988,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (unique_index_out)
+ *unique_index_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1012,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_index))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1026,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), unique_index));
MemoryContextSwitchTo(old_context);
+ if (unique_index_out)
+ *unique_index_out = unique_index;
+
return true; /* Success! */
}
else
@@ -1081,7 +1077,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ IndexOptInfo **unique_index)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1120,5 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_index);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 7a34abc..8e4abbe 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -190,7 +190,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index dbf9adc..e0abc2c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1583,7 +1583,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
* clauses for the rel, as well.
*/
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
- relation_has_unique_index_for(root, rel, NIL,
+ relation_get_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
sjinfo->semi_operators))
{
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 82b7842..845bb93 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -505,7 +501,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -607,7 +603,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -964,7 +960,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -977,9 +973,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -990,8 +986,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1000,7 +996,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1026,7 +1022,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1036,7 +1032,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 5af4840..362ae10 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -505,8 +505,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, IndexOptInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and IndexOptInfo* is an index that makes it unique,
+ * if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e99ae36..da1cc30 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -271,6 +271,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..57ee855 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,7 +71,7 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
-extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
+extern IndexOptInfo *relation_get_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..4805f74 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -103,13 +103,14 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ IndexOptInfo **unique_index);
/*
* prototypes for plan/setrefs.c
--
2.7.4
Here is a current version of the patch, still rather experimental. Since
the previous version, I fixed some bugs and added the possibility to
remove a relation even when it is mentioned in target lists. I have to
rewrite all references to the removed relation in targetlists and the
equivalence classes, so that they point to the remaining relation. I
change RestrictInfos in place, and update attr_needed and reltarget of
the remaining relation. I also update equivalence members, and delete
equivalence classes that become single-member.
I'm posting it a single file now, because all meaningful changes are in
analyzejoins.c anyway.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-unique-self-joins-v3.patchtext/x-patch; name=0001-Remove-unique-self-joins-v3.patchDownload
From 2f2be5d5efa0fcef38a0d6dd1225c73a377f88b8 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Wed, 13 Jun 2018 20:56:03 +0300
Subject: [PATCH] Remove unique self joins.
Remove inner joins of a relation to itself on all columns of some unique
index. Such a join can be replaced with a scan that uses the combined
restrictions from both sides. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery with slight
modifications.
---
src/backend/nodes/equalfuncs.c | 7 +-
src/backend/optimizer/path/indxpath.c | 23 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 819 +++++++++++++++++++++++++++---
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/relation.h | 6 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 14 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/join.out | 84 ++-
src/test/regress/sql/join.sql | 34 ++
13 files changed, 937 insertions(+), 107 deletions(-)
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 6a971d0..fd02dc8 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -166,9 +166,10 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
- COMPARE_LOCATION_FIELD(location);
+ /*
+ * Parse location and old varno/varattno may differ even when
+ * the variables are logically the same.
+ */
return true;
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..4daf3bf 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2964,7 +2964,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -2985,7 +2986,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3041,6 +3043,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3089,6 +3092,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3131,7 +3135,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ pfree(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ pfree(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..2edcf22 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..db12502 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -39,14 +42,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +166,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +603,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +625,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +930,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +942,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +978,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1002,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1016,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1067,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1110,703 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+ bool found;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ context->found = true;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ * Return whether any changes were made.
+ */
+static bool
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ context.found = false;
+ change_varno_walker((Node *) expr, &context);
+ return context.found;
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Move EC members from the removed relation to the remaining one,
+ * removing duplicates.
+ */
+static void
+move_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ EquivalenceMember *em;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ em = lfirst(cell);
+ if (change_varno(em->em_expr, toRemove, toKeep))
+ {
+ /*
+ * If we transferred the equivalence member to another relation,
+ * check that is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ ListCell *lc;
+ foreach (lc, ec->ec_members)
+ {
+ if (lc == cell)
+ continue;
+
+ if (equal(((EquivalenceMember *) lfirst(lc))->em_expr,
+ em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (lc == NULL)
+ {
+ /*
+ * We get to keep this EquivalenceMember. Correct its relids.
+ * nullable_relids should be empty, because self join removal
+ * only works for inner joins.
+ */
+ Assert(em->em_nullable_relids == NULL);
+ change_relid(&em->em_relids, toRemove, toKeep);
+ }
+ }
+ }
+}
+
+/*
+ * Remove EC sources referencing given relation.
+ */
+static void
+filter_ec_sources(List **sources, Index relToRemove)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ RestrictInfo *rinfo;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (bms_is_member(relToRemove, rinfo->required_relids))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *prev, *cell, *next;
+ List *toAppend;
+ int i;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Do not add multiple clauses derived from the same equivalence class.
+ */
+ if (is_redundant_derived_clause(rinfo, *target))
+ continue;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+
+ /*
+ * After we have replaced the Vars, check that the resulting clause is
+ * not implied by the existing ones.
+ */
+ if (!contain_mutable_functions((Node *) rinfo->clause)
+ && predicate_implied_by(list_make1(rinfo->clause),
+ *target, false /*weak*/ ))
+ continue; /* provably implied by r1 */
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Assert(is_opclause(rinfo->clause));
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ /*
+ * Finally, correct the relids of the rinfo, replace the clause with
+ * the one we just constructed, and append it to the remaining relation.
+ */
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * Likewise remove references from SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the outer join we're deleting is nested inside
+ * other outer joins: the upper joins' relid sets have to be adjusted. The
+ * RHS of the target outer join will be made empty here, but that's OK
+ * since caller will delete that SpecialJoinInfo entirely.
+ */
+ foreach(cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, toRemove->relid);
+ sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, toRemove->relid);
+ sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, toRemove->relid);
+ sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, toRemove->relid);
+ }
+
+
+ // !!!FIXME what about placeholders and upper-level tlists (e.g. for grouping)?
+ // The placeholders apparently work somehow due to the fact that they reference
+ // the same Var objects that we modify to point to the other relation.
+
+ /*
+ * We must move the equivalence members that reference the removed relation
+ * to the remaining one, being careful not to introduce duplicate members.
+ * If a EC contains one member and is not used for sorting, it can be removed
+ * altogether.
+ */
+ prev = NULL;
+ cell = NULL;
+ next = list_head(root->eq_classes);
+ while (next)
+ {
+ EquivalenceClass *ec;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC to reference the remaining relation instead
+ * of the removed one.
+ */
+ move_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ if (ec->ec_sortref == 0 && list_length(ec->ec_members) <= 1)
+ {
+ /*
+ * This EC is not used for sorting and contains one member.
+ * It won't generate any join clauses and can be removed.
+ */
+ root->eq_classes = list_delete_cell(root->eq_classes, cell, prev);
+ cell = prev;
+ }
+ else
+ {
+ /*
+ * The updated EC should be kept.
+ *
+ * Some of its source and derived RestrictInfos point to the removed
+ * relation. There is no straightforward way to determine whether
+ * such a RestrictInfo should be removed or switched to the
+ * remaining relation. We remove all of them, and will generate
+ * the correct ones from equivalence members on demand.
+ *
+ * This is not important for the ECs that are only used for sorting,
+ * but we process them too for the sake of consistency.
+ */
+ filter_ec_sources(&ec->ec_sources, toRemove->relid);
+ filter_ec_sources(&ec->ec_derives, toRemove->relid);
+ ec->ec_relids = bms_del_member(ec->ec_relids, toRemove->relid);
+ }
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Remove references to the rel from other baserels' attr_needed arrays.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ otherrel->attr_needed[attroff] =
+ bms_del_member(otherrel->attr_needed[attroff], toRemove->relid);
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The index clauses must also be the same. The varnos are different, so
+ * make a copy and replace all varnos of one relation with another, so
+ * that we can compare them with equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns IntList of the relids that were removed.
+ */
+static List *
+remove_self_joins_one_group(PlannerInfo *root, Index *relids, int n, UsjScratch *scratch)
+{
+ Relids joinrelids = scratch->joinrelids;
+ List *result = NIL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NIL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, joinrelids, restrictlist, inner, outer);
+ result = lappend_int(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ return root->simple_rte_array[*left]->relid
+ < root->simple_rte_array[*right]->relid;
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(PlannerInfo *root, List **joinlist, UsjScratch *scratch)
+{
+ ListCell *lc;
+ List *relidsToRemove = NIL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, *joinlist)
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ /*
+ * We only care about base relations from which we select something.
+ */
+ if (root->simple_rte_array[ref->rtindex]->rtekind == RTE_RELATION
+ && root->simple_rte_array[ref->rtindex]->relkind == RELKIND_RELATION
+ && root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ relid_ascending[n++] = ref->rtindex;
+ }
+
+ /*
+ * Limit the number of joins we process to control the quadratic behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(root, &relid_ascending[groupStart],
+ i - groupStart, scratch));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(root, &relid_ascending[groupStart],
+ n - groupStart, scratch));
+
+ /*
+ * Delete the removed relations from joinlist.
+ */
+ foreach(lc, relidsToRemove)
+ {
+ Index indexToRemove = lfirst_int(lc);
+ ListCell *prev = NULL, *next = NULL;
+ ListCell *lc2 = list_head(*joinlist);
+ while (lc2)
+ {
+ next = lnext(lc2);
+ if (castNode(RangeTblRef, lfirst(lc2))->rtindex == indexToRemove)
+ *joinlist = list_delete_cell(*joinlist, lc2, prev);
+ else
+ prev = lc2;
+ lc2 = next;
+ }
+ }
+
+ return;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(PlannerInfo *root, List **joinlist, UsjScratch *scratch)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(root, (List **) &lfirst(lc), scratch);
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(root, joinlist, scratch);
+}
+
+/*
+ * Find out which relations have special joins to which.
+ */
+static void
+find_special_joins(PlannerInfo *root, Relids *special_join_rels)
+{
+ ListCell *lc;
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ special_join_rels[rel->relid] =
+ bms_add_members(special_join_rels[rel->relid], info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ special_join_rels[rel->relid] =
+ bms_add_members(special_join_rels[rel->relid], info->min_lefthand);
+ }
+ }
+}
+
+/*
+ * Find and remove unique self joins in the entire join tree.
+ *
+ * We try to find joins where the same physical relation is joined to
+ * itself on all the columns of a unique index. Such joins can be
+ * can be replaced with a scan using the combined filters.
+ *
+ * When we wind such a join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation.
+ *
+ * 'targetlist' is the top-level targetlist of query. We fix any references
+ * it has to the relations we remove.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ UsjScratch scratch;
+
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ find_special_joins(root, scratch.special_join_rels);
+ remove_self_joins_recurse(root, joinlist, &scratch);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index b05adc7..49c5f61 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -196,7 +196,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -205,6 +205,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index dbf9adc..cb33509 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1585,7 +1585,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c69740e..dc10ac3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -542,7 +538,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -644,7 +640,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1001,7 +997,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1014,9 +1010,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1027,8 +1023,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1037,7 +1033,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1063,7 +1059,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1073,7 +1069,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7cae3fc..6bc8bb0 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -515,8 +515,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 4ba358e..d78923a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -272,6 +272,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..3c74e13 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,21 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..701aeab 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -103,13 +103,19 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
+typedef struct UniqueIndexInfo UniqueIndexInfo;
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index dc6262b..37944a0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4307,11 +4307,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4428,6 +4430,67 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sj t1
+ Filter: (b = 1)
+ -> Seq Scan on sj t2
+ Filter: (b = 2)
+(6 rows)
+
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
@@ -5871,6 +5934,8 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
Output: j2.id1, j2.id2
(8 rows)
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
@@ -5882,14 +5947,11 @@ create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
explain (costs off) select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
- QUERY PLAN
---------------------------------------------
- Merge Join
- Merge Cond: (j1.id1 = j2.id1)
- Join Filter: (j1.id2 = j2.id2)
- -> Index Scan using j1_id1_idx on j1
- -> Index Scan using j1_id1_idx on j1 j2
-(5 rows)
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Seq Scan on j1 j2
+ Filter: ((id1 IS NOT NULL) AND (id2 IS NOT NULL) AND ((id1 % 1000) = 1))
+(2 rows)
select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d3ba2a1..6f18c73 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1528,6 +1528,37 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
-- Test hints given on incorrect column references are useful
--
@@ -1968,6 +1999,9 @@ explain (verbose, costs off)
select * from j1
left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
+
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
--
2.7.4
On Sat, Jul 28, 2018 at 12:26 AM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Here is a current version of the patch, still rather experimental.
Hi Alexander,
The eval-qual-plan isolation test is failing:
- checking 1050 checking 600
+ checking 600 checking 600
That's the result of a self join with EPQ on one side of the join:
SELECT * FROM accounts a1, accounts a2
WHERE a1.accountid = a2.accountid
FOR UPDATE OF a1;
I think you need to disable the optimisation when there is a locking
clause on one side. Maybe it could be allowed if it's on both sides?
Not sure.
+ Assert(is_opclause(rinfo->clause));
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
You can't declare a variable here in C89.
--
Thomas Munro
http://www.enterprisedb.com
On 07/29/2018 01:42 PM, Thomas Munro wrote:
I think you need to disable the optimisation when there is a locking
clause on one side. Maybe it could be allowed if it's on both sides?
Not sure.
Hi Thomas,
Thanks for testing. I think it should be OK to remove the join if the
row marks for both sides have the same type. I added the code to check
for that.
You can't declare a variable here in C89.
Fixed.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-unique-self-joins-v4.patchtext/x-patch; name=0001-Remove-unique-self-joins-v4.patchDownload
From a72a36567b8f0855f223196aca31dd051bd0d7e4 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Wed, 13 Jun 2018 20:56:03 +0300
Subject: [PATCH 1/2] Remove unique self joins.
Remove inner joins of a relation to itself on all columns of some unique
index. Such a join can be replaced with a scan that uses the combined
restrictions from both sides. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery with slight
modifications.
---
src/backend/nodes/equalfuncs.c | 7 +-
src/backend/optimizer/path/indxpath.c | 23 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 890 +++++++++++++++++++++++++++---
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/relation.h | 6 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 14 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/join.out | 84 ++-
src/test/regress/sql/join.sql | 34 ++
13 files changed, 1008 insertions(+), 107 deletions(-)
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2fa..ee30619 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -166,9 +166,10 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
- COMPARE_LOCATION_FIELD(location);
+ /*
+ * Parse location and old varno/varattno may differ even when
+ * the variables are logically the same.
+ */
return true;
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..4daf3bf 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2964,7 +2964,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -2985,7 +2986,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3041,6 +3043,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3089,6 +3092,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3131,7 +3135,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ pfree(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ pfree(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..2edcf22 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..5e053d5 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -39,14 +42,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +166,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +603,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +625,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +930,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +942,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +978,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1002,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1016,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1067,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1110,774 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+ bool found;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ context->found = true;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ * Return whether any changes were made.
+ */
+static bool
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ context.found = false;
+ change_varno_walker((Node *) expr, &context);
+ return context.found;
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Move EC members from the removed relation to the remaining one,
+ * removing duplicates.
+ */
+static void
+move_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ EquivalenceMember *em;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ em = lfirst(cell);
+ if (change_varno(em->em_expr, toRemove, toKeep))
+ {
+ /*
+ * If we transferred the equivalence member to another relation,
+ * check that is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ ListCell *lc;
+ foreach (lc, ec->ec_members)
+ {
+ if (lc == cell)
+ continue;
+
+ if (equal(((EquivalenceMember *) lfirst(lc))->em_expr,
+ em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (lc == NULL)
+ {
+ /*
+ * We get to keep this EquivalenceMember. Correct its relids.
+ * nullable_relids should be empty, because self join removal
+ * only works for inner joins.
+ */
+ Assert(em->em_nullable_relids == NULL);
+ change_relid(&em->em_relids, toRemove, toKeep);
+ }
+ }
+ }
+}
+
+/*
+ * Remove EC sources referencing given relation.
+ */
+static void
+filter_ec_sources(List **sources, Index relToRemove)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ RestrictInfo *rinfo;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (bms_is_member(relToRemove, rinfo->required_relids))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *prev, *cell, *next;
+ List *toAppend;
+ int i;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Do not add multiple clauses derived from the same equivalence class.
+ */
+ if (is_redundant_derived_clause(rinfo, *target))
+ continue;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+
+ /*
+ * After we have replaced the Vars, check that the resulting clause is
+ * not implied by the existing ones.
+ */
+ if (!contain_mutable_functions((Node *) rinfo->clause)
+ && predicate_implied_by(list_make1(rinfo->clause),
+ *target, false /*weak*/ ))
+ continue; /* provably implied by r1 */
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ /*
+ * Finally, correct the relids of the rinfo, replace the clause with
+ * the one we just constructed, and append it to the remaining relation.
+ */
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise remove references from SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the outer join we're deleting is nested inside
+ * other outer joins: the upper joins' relid sets have to be adjusted. The
+ * RHS of the target outer join will be made empty here, but that's OK
+ * since caller will delete that SpecialJoinInfo entirely.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, toRemove->relid);
+ sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, toRemove->relid);
+ sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, toRemove->relid);
+ sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, toRemove->relid);
+ }
+
+
+ // !!!FIXME what about placeholders and upper-level tlists (e.g. for grouping)?
+ // The placeholders apparently work somehow due to the fact that they reference
+ // the same Var objects that we modify to point to the other relation.
+
+ /*
+ * We must move the equivalence members that reference the removed relation
+ * to the remaining one, being careful not to introduce duplicate members.
+ * If a EC contains one member and is not used for sorting, it can be removed
+ * altogether.
+ */
+ prev = NULL;
+ cell = NULL;
+ next = list_head(root->eq_classes);
+ while (next)
+ {
+ EquivalenceClass *ec;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC to reference the remaining relation instead
+ * of the removed one.
+ */
+ move_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ if (ec->ec_sortref == 0 && list_length(ec->ec_members) <= 1)
+ {
+ /*
+ * This EC is not used for sorting and contains one member.
+ * It won't generate any join clauses and can be removed.
+ */
+ root->eq_classes = list_delete_cell(root->eq_classes, cell, prev);
+ cell = prev;
+ }
+ else
+ {
+ /*
+ * The updated EC should be kept.
+ *
+ * Some of its source and derived RestrictInfos point to the removed
+ * relation. There is no straightforward way to determine whether
+ * such a RestrictInfo should be removed or switched to the
+ * remaining relation. We remove all of them, and will generate
+ * the correct ones from equivalence members on demand.
+ *
+ * This is not important for the ECs that are only used for sorting,
+ * but we process them too for the sake of consistency.
+ */
+ filter_ec_sources(&ec->ec_sources, toRemove->relid);
+ filter_ec_sources(&ec->ec_derives, toRemove->relid);
+ ec->ec_relids = bms_del_member(ec->ec_relids, toRemove->relid);
+ }
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Remove references to the rel from other baserels' attr_needed arrays.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ otherrel->attr_needed[attroff] =
+ bms_del_member(otherrel->attr_needed[attroff], toRemove->relid);
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The index clauses must also be the same. The varnos are different, so
+ * make a copy and replace all varnos of one relation with another, so
+ * that we can compare them with equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns IntList of the relids that were removed.
+ */
+static List *
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ List *result = NIL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NIL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = lappend_int(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ return root->simple_rte_array[*left]->relid
+ < root->simple_rte_array[*right]->relid;
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ List *relidsToRemove = NIL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /* This optimization won't work for tables that have inheritance children. */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /*
+ * Delete the removed relations from joinlist.
+ */
+ foreach(lc, relidsToRemove)
+ {
+ Index indexToRemove = lfirst_int(lc);
+ ListCell *prev = NULL, *next = NULL;
+ ListCell *lc2 = list_head(*joinlist);
+ while (lc2)
+ {
+ next = lnext(lc2);
+ if (castNode(RangeTblRef, lfirst(lc2))->rtindex == indexToRemove)
+ *joinlist = list_delete_cell(*joinlist, lc2, prev);
+ else
+ prev = lc2;
+ lc2 = next;
+ }
+ }
+
+ return;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove unique self joins in the entire join tree.
+ *
+ * We try to find joins where the same physical relation is joined to
+ * itself on all the columns of a unique index. Such joins can be
+ * can be replaced with a scan using the combined filters.
+ *
+ * When we wind such a join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation.
+ *
+ * 'targetlist' is the top-level targetlist of query. We fix any references
+ * it has to the relations we remove.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index b05adc7..49c5f61 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -196,7 +196,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -205,6 +205,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c5aaaf5..fe4f574 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1585,7 +1585,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c69740e..dc10ac3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -542,7 +538,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -644,7 +640,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1001,7 +997,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1014,9 +1010,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1027,8 +1023,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1037,7 +1033,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1063,7 +1059,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1073,7 +1069,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 41caf87..7d90fd0 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -515,8 +515,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 7c5ff22..930155d 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -271,6 +271,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..3c74e13 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,21 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..701aeab 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -103,13 +103,19 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
+typedef struct UniqueIndexInfo UniqueIndexInfo;
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index dc6262b..37944a0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4307,11 +4307,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4428,6 +4430,67 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sj t1
+ Filter: (b = 1)
+ -> Seq Scan on sj t2
+ Filter: (b = 2)
+(6 rows)
+
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
@@ -5871,6 +5934,8 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
Output: j2.id1, j2.id2
(8 rows)
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
@@ -5882,14 +5947,11 @@ create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
explain (costs off) select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
- QUERY PLAN
---------------------------------------------
- Merge Join
- Merge Cond: (j1.id1 = j2.id1)
- Join Filter: (j1.id2 = j2.id2)
- -> Index Scan using j1_id1_idx on j1
- -> Index Scan using j1_id1_idx on j1 j2
-(5 rows)
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Seq Scan on j1 j2
+ Filter: ((id1 IS NOT NULL) AND (id2 IS NOT NULL) AND ((id1 % 1000) = 1))
+(2 rows)
select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d3ba2a1..6f18c73 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1528,6 +1528,37 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
-- Test hints given on incorrect column references are useful
--
@@ -1968,6 +1999,9 @@ explain (verbose, costs off)
select * from j1
left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
+
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
--
2.7.4
Here is a rebased version of the patch. It includes some fixes after an
off-list review by Konstantin Knizhnik.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
remove-self-joins-v5.patchtext/x-patch; name=remove-self-joins-v5.patchDownload
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3bb91c9..62d46a1 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -166,10 +166,13 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
COMPARE_LOCATION_FIELD(location);
+ /*
+ * varnoold/varoattno are used only for debugging and may differ even
+ * when the variables are logically the same.
+ */
+
return true;
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..fb07ca0 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2964,7 +2964,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -2985,7 +2986,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3041,6 +3043,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3089,6 +3092,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3131,7 +3135,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..2edcf22 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..2bb3a10 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -39,14 +42,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +166,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +603,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +625,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +930,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +942,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +978,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1002,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1016,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1067,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1110,776 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+ bool found;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ context->found = true;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ * Return whether any changes were made.
+ */
+static bool
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ context.found = false;
+ change_varno_walker((Node *) expr, &context);
+ return context.found;
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Move EC members from the removed relation to the remaining one,
+ * removing duplicates.
+ */
+static void
+move_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ EquivalenceMember *em;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ em = lfirst(cell);
+ if (change_varno(em->em_expr, toRemove, toKeep))
+ {
+ /*
+ * If we transferred the equivalence member to another relation,
+ * check that is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ ListCell *lc;
+ foreach (lc, ec->ec_members)
+ {
+ if (lc == cell)
+ continue;
+
+ if (equal(((EquivalenceMember *) lfirst(lc))->em_expr,
+ em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (lc == NULL)
+ {
+ /*
+ * We get to keep this EquivalenceMember. Correct its relids.
+ * nullable_relids should be empty, because self join removal
+ * only works for inner joins.
+ */
+ Assert(em->em_nullable_relids == NULL);
+ change_relid(&em->em_relids, toRemove, toKeep);
+ }
+ }
+ }
+}
+
+/*
+ * Remove EC sources referencing given relation.
+ */
+static void
+filter_ec_sources(List **sources, Index relToRemove)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ RestrictInfo *rinfo;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (bms_is_member(relToRemove, rinfo->required_relids))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *prev, *cell, *next;
+ List *toAppend;
+ int i;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Do not add multiple clauses derived from the same equivalence class.
+ */
+ if (is_redundant_derived_clause(rinfo, *target))
+ continue;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+
+ /*
+ * After we have replaced the Vars, check that the resulting clause is
+ * not implied by the existing ones.
+ */
+ if (!contain_mutable_functions((Node *) rinfo->clause)
+ && predicate_implied_by(list_make1(rinfo->clause),
+ *target, false /*weak*/ ))
+ continue; /* provably implied by r1 */
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ /*
+ * Finally, correct the relids of the rinfo, replace the clause with
+ * the one we just constructed, and append it to the remaining relation.
+ */
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise remove references from SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the outer join we're deleting is nested inside
+ * other outer joins: the upper joins' relid sets have to be adjusted. The
+ * RHS of the target outer join will be made empty here, but that's OK
+ * since caller will delete that SpecialJoinInfo entirely.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, toRemove->relid);
+ sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, toRemove->relid);
+ sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, toRemove->relid);
+ sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, toRemove->relid);
+ }
+
+
+ // !!!FIXME what about placeholders and upper-level tlists (e.g. for grouping)?
+ // The placeholders apparently work somehow due to the fact that they reference
+ // the same Var objects that we modify to point to the other relation.
+
+ /*
+ * We must move the equivalence members that reference the removed relation
+ * to the remaining one, being careful not to introduce duplicate members.
+ * If a EC contains one member and is not used for sorting, it can be removed
+ * altogether.
+ */
+ prev = NULL;
+ cell = NULL;
+ next = list_head(root->eq_classes);
+ while (next)
+ {
+ EquivalenceClass *ec;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC to reference the remaining relation instead
+ * of the removed one.
+ */
+ move_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ if (ec->ec_sortref == 0 && list_length(ec->ec_members) <= 1)
+ {
+ /*
+ * This EC is not used for sorting and contains one member.
+ * It won't generate any join clauses and can be removed.
+ */
+ root->eq_classes = list_delete_cell(root->eq_classes, cell, prev);
+ cell = prev;
+ }
+ else
+ {
+ /*
+ * The updated EC should be kept.
+ *
+ * Some of its source and derived RestrictInfos point to the removed
+ * relation. There is no straightforward way to determine whether
+ * such a RestrictInfo should be removed or switched to the
+ * remaining relation. We remove all of them, and will generate
+ * the correct ones from equivalence members on demand.
+ *
+ * This is not important for the ECs that are only used for sorting,
+ * but we process them too for the sake of consistency.
+ */
+ filter_ec_sources(&ec->ec_sources, toRemove->relid);
+ filter_ec_sources(&ec->ec_derives, toRemove->relid);
+ ec->ec_relids = bms_del_member(ec->ec_relids, toRemove->relid);
+ }
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * There may be references to the removed rel in other baserels' attr_needed
+ * arrays. Switch them to point to the remaining rel.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The index clauses must also be the same. The varnos are different, so
+ * make a copy and replace all varnos of one relation with another, so
+ * that we can compare them with equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns IntList of the relids that were removed.
+ */
+static List *
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ List *result = NIL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NIL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = lappend_int(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+ return l < r ? 1 : ( l == r ? 0 : -1 );
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ List *relidsToRemove = NIL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /* This optimization won't work for tables that have inheritance children. */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /*
+ * Delete the removed relations from joinlist.
+ */
+ foreach(lc, relidsToRemove)
+ {
+ Index indexToRemove = lfirst_int(lc);
+ ListCell *prev = NULL, *next = NULL;
+ ListCell *lc2 = list_head(*joinlist);
+ while (lc2)
+ {
+ next = lnext(lc2);
+ if (castNode(RangeTblRef, lfirst(lc2))->rtindex == indexToRemove)
+ *joinlist = list_delete_cell(*joinlist, lc2, prev);
+ else
+ prev = lc2;
+ lc2 = next;
+ }
+ }
+
+ return;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove unique self joins in the entire join tree.
+ *
+ * We try to find joins where the same physical relation is joined to
+ * itself on all the columns of a unique index. Such joins can be
+ * can be replaced with a scan using the combined filters.
+ *
+ * When we wind such a join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation.
+ *
+ * 'targetlist' is the top-level targetlist of query. We fix any references
+ * it has to the relations we remove.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index b05adc7..49c5f61 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -196,7 +196,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -205,6 +205,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 83008d7..0527420 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -112,15 +112,14 @@ assign_param_for_var(PlannerInfo *root, Var *var)
/*
* This comparison must match _equalVar(), except for ignoring
- * varlevelsup. Note that _equalVar() ignores the location.
+ * varlevelsup. Note that _equalVar() ignores the location and
+ * old varno/varattno.
*/
if (pvar->varno == var->varno &&
pvar->varattno == var->varattno &&
pvar->vartype == var->vartype &&
pvar->vartypmod == var->vartypmod &&
- pvar->varcollid == var->varcollid &&
- pvar->varnoold == var->varnoold &&
- pvar->varoattno == var->varoattno)
+ pvar->varcollid == var->varcollid)
return pitem->paramId;
}
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..3f73b38 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1585,7 +1585,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 39f5729..0ea4a02 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -548,7 +544,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -651,7 +647,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -973,7 +969,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -986,9 +982,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -999,8 +995,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1009,7 +1005,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1035,7 +1031,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1082,7 +1078,7 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 88d3723..d2dc20c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -514,8 +514,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..6814660 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -271,6 +271,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..ad4a2ad 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,20 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..b9b00cc 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -14,6 +14,7 @@
#ifndef PLANMAIN_H
#define PLANMAIN_H
+#include "optimizer/paths.h"
#include "nodes/plannodes.h"
#include "nodes/relation.h"
@@ -103,13 +104,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index dc6262b..0fd00db 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4307,11 +4307,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4428,6 +4430,113 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sj t1
+ Filter: (b = 1)
+ -> Seq Scan on sj t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------------
+ Hash Join
+ Hash Cond: (k1.b = j2.b)
+ -> Merge Join
+ Merge Cond: (k1.a = k2.a)
+ -> Index Scan using sk_a_idx on sk k1
+ -> Materialize
+ -> Index Only Scan using sk_a_idx on sk k2
+ -> Hash
+ -> Index Only Scan using sj_ab on sj j2
+ Index Cond: (a IS NOT NULL)
+(10 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------------
+ Hash Join
+ Hash Cond: (k1.b = j2.b)
+ -> Merge Join
+ Merge Cond: (k1.a = k2.a)
+ -> Index Scan using sk_a_idx on sk k1
+ -> Materialize
+ -> Index Only Scan using sk_a_idx on sk k2
+ -> Hash
+ -> Index Only Scan using sj_ab on sj j2
+ Index Cond: (a IS NOT NULL)
+(10 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
@@ -5871,6 +5980,8 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
Output: j2.id1, j2.id2
(8 rows)
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
@@ -5882,14 +5993,11 @@ create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
explain (costs off) select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
- QUERY PLAN
---------------------------------------------
- Merge Join
- Merge Cond: (j1.id1 = j2.id1)
- Join Filter: (j1.id2 = j2.id2)
- -> Index Scan using j1_id1_idx on j1
- -> Index Scan using j1_id1_idx on j1 j2
-(5 rows)
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Seq Scan on j1 j2
+ Filter: ((id1 IS NOT NULL) AND (id2 IS NOT NULL) AND ((id1 % 1000) = 1))
+(2 rows)
select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d3ba2a1..44b2f55 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1528,6 +1528,56 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
-- Test hints given on incorrect column references are useful
--
@@ -1968,6 +2018,9 @@ explain (verbose, costs off)
select * from j1
left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
+
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
Here is a version that compiles.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
remove-self-join-v6.patchtext/x-patch; name=remove-self-join-v6.patchDownload
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3bb91c9..62d46a1 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -166,10 +166,13 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
COMPARE_LOCATION_FIELD(location);
+ /*
+ * varnoold/varoattno are used only for debugging and may differ even
+ * when the variables are logically the same.
+ */
+
return true;
}
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index f295558..fb07ca0 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2964,7 +2964,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -2985,7 +2986,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3041,6 +3043,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3089,6 +3092,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3131,7 +3135,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..2edcf22 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..2bb3a10 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -39,14 +42,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +166,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +603,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +625,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +930,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +942,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +978,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1002,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1016,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1067,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1110,776 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+ bool found;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ context->found = true;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ * Return whether any changes were made.
+ */
+static bool
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ context.found = false;
+ change_varno_walker((Node *) expr, &context);
+ return context.found;
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Move EC members from the removed relation to the remaining one,
+ * removing duplicates.
+ */
+static void
+move_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ EquivalenceMember *em;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ em = lfirst(cell);
+ if (change_varno(em->em_expr, toRemove, toKeep))
+ {
+ /*
+ * If we transferred the equivalence member to another relation,
+ * check that is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ ListCell *lc;
+ foreach (lc, ec->ec_members)
+ {
+ if (lc == cell)
+ continue;
+
+ if (equal(((EquivalenceMember *) lfirst(lc))->em_expr,
+ em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (lc == NULL)
+ {
+ /*
+ * We get to keep this EquivalenceMember. Correct its relids.
+ * nullable_relids should be empty, because self join removal
+ * only works for inner joins.
+ */
+ Assert(em->em_nullable_relids == NULL);
+ change_relid(&em->em_relids, toRemove, toKeep);
+ }
+ }
+ }
+}
+
+/*
+ * Remove EC sources referencing given relation.
+ */
+static void
+filter_ec_sources(List **sources, Index relToRemove)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ RestrictInfo *rinfo;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (bms_is_member(relToRemove, rinfo->required_relids))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *prev, *cell, *next;
+ List *toAppend;
+ int i;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Do not add multiple clauses derived from the same equivalence class.
+ */
+ if (is_redundant_derived_clause(rinfo, *target))
+ continue;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+
+ /*
+ * After we have replaced the Vars, check that the resulting clause is
+ * not implied by the existing ones.
+ */
+ if (!contain_mutable_functions((Node *) rinfo->clause)
+ && predicate_implied_by(list_make1(rinfo->clause),
+ *target, false /*weak*/ ))
+ continue; /* provably implied by r1 */
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ /*
+ * Finally, correct the relids of the rinfo, replace the clause with
+ * the one we just constructed, and append it to the remaining relation.
+ */
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise remove references from SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the outer join we're deleting is nested inside
+ * other outer joins: the upper joins' relid sets have to be adjusted. The
+ * RHS of the target outer join will be made empty here, but that's OK
+ * since caller will delete that SpecialJoinInfo entirely.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, toRemove->relid);
+ sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, toRemove->relid);
+ sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, toRemove->relid);
+ sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, toRemove->relid);
+ }
+
+
+ // !!!FIXME what about placeholders and upper-level tlists (e.g. for grouping)?
+ // The placeholders apparently work somehow due to the fact that they reference
+ // the same Var objects that we modify to point to the other relation.
+
+ /*
+ * We must move the equivalence members that reference the removed relation
+ * to the remaining one, being careful not to introduce duplicate members.
+ * If a EC contains one member and is not used for sorting, it can be removed
+ * altogether.
+ */
+ prev = NULL;
+ cell = NULL;
+ next = list_head(root->eq_classes);
+ while (next)
+ {
+ EquivalenceClass *ec;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC to reference the remaining relation instead
+ * of the removed one.
+ */
+ move_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ if (ec->ec_sortref == 0 && list_length(ec->ec_members) <= 1)
+ {
+ /*
+ * This EC is not used for sorting and contains one member.
+ * It won't generate any join clauses and can be removed.
+ */
+ root->eq_classes = list_delete_cell(root->eq_classes, cell, prev);
+ cell = prev;
+ }
+ else
+ {
+ /*
+ * The updated EC should be kept.
+ *
+ * Some of its source and derived RestrictInfos point to the removed
+ * relation. There is no straightforward way to determine whether
+ * such a RestrictInfo should be removed or switched to the
+ * remaining relation. We remove all of them, and will generate
+ * the correct ones from equivalence members on demand.
+ *
+ * This is not important for the ECs that are only used for sorting,
+ * but we process them too for the sake of consistency.
+ */
+ filter_ec_sources(&ec->ec_sources, toRemove->relid);
+ filter_ec_sources(&ec->ec_derives, toRemove->relid);
+ ec->ec_relids = bms_del_member(ec->ec_relids, toRemove->relid);
+ }
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * There may be references to the removed rel in other baserels' attr_needed
+ * arrays. Switch them to point to the remaining rel.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The index clauses must also be the same. The varnos are different, so
+ * make a copy and replace all varnos of one relation with another, so
+ * that we can compare them with equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns IntList of the relids that were removed.
+ */
+static List *
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ List *result = NIL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NIL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = lappend_int(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+ return l < r ? 1 : ( l == r ? 0 : -1 );
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ List *relidsToRemove = NIL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /* This optimization won't work for tables that have inheritance children. */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = list_concat(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /*
+ * Delete the removed relations from joinlist.
+ */
+ foreach(lc, relidsToRemove)
+ {
+ Index indexToRemove = lfirst_int(lc);
+ ListCell *prev = NULL, *next = NULL;
+ ListCell *lc2 = list_head(*joinlist);
+ while (lc2)
+ {
+ next = lnext(lc2);
+ if (castNode(RangeTblRef, lfirst(lc2))->rtindex == indexToRemove)
+ *joinlist = list_delete_cell(*joinlist, lc2, prev);
+ else
+ prev = lc2;
+ lc2 = next;
+ }
+ }
+
+ return;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove unique self joins in the entire join tree.
+ *
+ * We try to find joins where the same physical relation is joined to
+ * itself on all the columns of a unique index. Such joins can be
+ * can be replaced with a scan using the combined filters.
+ *
+ * When we wind such a join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation.
+ *
+ * 'targetlist' is the top-level targetlist of query. We fix any references
+ * it has to the relations we remove.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index b05adc7..49c5f61 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -196,7 +196,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -205,6 +205,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 83008d7..0527420 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -112,15 +112,14 @@ assign_param_for_var(PlannerInfo *root, Var *var)
/*
* This comparison must match _equalVar(), except for ignoring
- * varlevelsup. Note that _equalVar() ignores the location.
+ * varlevelsup. Note that _equalVar() ignores the location and
+ * old varno/varattno.
*/
if (pvar->varno == var->varno &&
pvar->varattno == var->varattno &&
pvar->vartype == var->vartype &&
pvar->vartypmod == var->vartypmod &&
- pvar->varcollid == var->varcollid &&
- pvar->varnoold == var->varnoold &&
- pvar->varoattno == var->varoattno)
+ pvar->varcollid == var->varcollid)
return pitem->paramId;
}
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..3f73b38 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1585,7 +1585,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 39f5729..1b21a6e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -548,7 +544,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -651,7 +647,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -973,7 +969,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -986,9 +982,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -999,8 +995,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1009,7 +1005,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1035,7 +1031,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1045,7 +1041,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 88d3723..d2dc20c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -514,8 +514,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..6814660 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -271,6 +271,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..ad4a2ad 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,20 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..b9b00cc 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -14,6 +14,7 @@
#ifndef PLANMAIN_H
#define PLANMAIN_H
+#include "optimizer/paths.h"
#include "nodes/plannodes.h"
#include "nodes/relation.h"
@@ -103,13 +104,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 1f53780..86a790f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4321,11 +4321,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4442,6 +4444,113 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sj t1
+ Filter: (b = 1)
+ -> Seq Scan on sj t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------------
+ Hash Join
+ Hash Cond: (k1.b = j2.b)
+ -> Merge Join
+ Merge Cond: (k1.a = k2.a)
+ -> Index Scan using sk_a_idx on sk k1
+ -> Materialize
+ -> Index Only Scan using sk_a_idx on sk k2
+ -> Hash
+ -> Index Only Scan using sj_ab on sj j2
+ Index Cond: (a IS NOT NULL)
+(10 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------------
+ Hash Join
+ Hash Cond: (k1.b = j2.b)
+ -> Merge Join
+ Merge Cond: (k1.a = k2.a)
+ -> Index Scan using sk_a_idx on sk k1
+ -> Materialize
+ -> Index Only Scan using sk_a_idx on sk k2
+ -> Hash
+ -> Index Only Scan using sj_ab on sj j2
+ Index Cond: (a IS NOT NULL)
+(10 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
@@ -5885,6 +5994,8 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
Output: j2.id1, j2.id2
(8 rows)
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
@@ -5896,14 +6007,11 @@ create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
explain (costs off) select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
- QUERY PLAN
---------------------------------------------
- Merge Join
- Merge Cond: (j1.id1 = j2.id1)
- Join Filter: (j1.id2 = j2.id2)
- -> Index Scan using j1_id1_idx on j1
- -> Index Scan using j1_id1_idx on j1 j2
-(5 rows)
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Seq Scan on j1 j2
+ Filter: ((id1 IS NOT NULL) AND (id2 IS NOT NULL) AND ((id1 % 1000) = 1))
+(2 rows)
select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 334a4dc..9772e49 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1535,6 +1535,56 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- equality to different constants
+create unique index sj_ab on sj(a, b);
+
+explain (costs off)
+select * from sj t1, sj t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
-- Test hints given on incorrect column references are useful
--
@@ -1975,6 +2025,9 @@ explain (verbose, costs off)
select * from j1
left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
+
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
On 19 October 2018 at 01:47, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Here is a version that compiles.
I had a quick read through this and I think its missing about a 1-page
comment section detailing when we can and when we cannot remove these
self joins, and what measures we must take when we do remove them.
Apart from that, I noted the following during my read:
1. I don't think this is the right way to do this. There are other
places where we alter the varnoold. For example:
search_indexed_tlist_for_var(). So you should likely be doing that too
rather than working around it.
@@ -166,10 +166,13 @@ _equalVar(const Var *a, const Var *b)
COMPARE_SCALAR_FIELD(vartypmod);
COMPARE_SCALAR_FIELD(varcollid);
COMPARE_SCALAR_FIELD(varlevelsup);
- COMPARE_SCALAR_FIELD(varnoold);
- COMPARE_SCALAR_FIELD(varoattno);
COMPARE_LOCATION_FIELD(location);
+ /*
+ * varnoold/varoattno are used only for debugging and may differ even
+ * when the variables are logically the same.
+ */
+
2. Surely the following loop is incorrect:
for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
{
int attno = i - toKeep->min_attr;
toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
toRemove->attr_needed[attno]);
}
What if toRemove has a lower min_attr or higher max_attr?
3. "wind" -> "find"
+ * When we wind such a join, we mark one of the participating relation as
4. I think the following shouldn't be happening:
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
5. I'd have thought the opposite. Surely there are more chances of
this being useful with more joins?
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
6. In remove_self_joins_one_level() I think you should collect the
removed relations in a Relids rather than a list.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
El 08/11/18 a las 08:59, David Rowley escribió:
On 19 October 2018 at 01:47, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:Here is a version that compiles.
I had a quick read through this and I think its missing about a 1-page
comment section detailing when we can and when we cannot remove these
self joins, and what measures we must take when we do remove them.
I added some explanation to the comment for remove_useless_joins. This
is probably still not clear enough, so if you have any particular
questions I'll cover them too. While improving the comments, I found
some bugs around the handling of join clauses and broken ECs, so I fixed
them and added the tests.
Apart from that, I noted the following during my read:
1. I don't think this is the right way to do this. There are other
places where we alter the varnoold. For example:
search_indexed_tlist_for_var(). So you should likely be doing that too
rather than working around it.
Fixed.
2. Surely the following loop is incorrect:
for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
{
int attno = i - toKeep->min_attr;
toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
toRemove->attr_needed[attno]);
}What if toRemove has a lower min_attr or higher max_attr?
This shouldn't happen because this is always the same relation, and
max_attr is its physical number of attributes. There is an assertion
about this in remove_self_joins_one_group:
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[relids[i]]->relid ==
root->simple_rte_array[relids[o]]->relid);
3. "wind" -> "find"
+ * When we wind such a join, we mark one of the participating relation as
Fixed.
4. I think the following shouldn't be happening:
+------------------------------------------------ Result One-Time Filter: false -(2 rows) + -> Index Scan using parent_pkey on parent x + Index Cond: (k = 1) +(4 rows)
This happens because for join rels, we make some effort to prove that
they are empty and not make any paths for them, and we don't do this for
base rels. When we remove the join, this difference is exposed. Compare
to this query:
postgres=# explain select * from parent where k = 1 and k = 2;
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────
Result (cost=0.15..8.17 rows=1 width=8)
One-Time Filter: false
-> Index Scan using parent_pkey on parent (cost=0.15..8.17 rows=1
width=8)
Index Cond: (k = 1)
(4 rows)
5. I'd have thought the opposite. Surely there are more chances of
this being useful with more joins?+ /* Limit the number of joins we process to control the quadratic behavior. */ + if (n > join_collapse_limit) + break;
That is true, but we also have to think about the overhead when we don't
find any joins to remove. Without this cutoff, we have to examine every
pair of self-joins, so the run time grows quadratically with the number
of such joins in the query. I don't have a better idea on how to control
this.
6. In remove_self_joins_one_level() I think you should collect the
removed relations in a Relids rather than a list.
Done.
--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
Attachments:
remove-self-join-v7.patchtext/x-patch; name=remove-self-join-v7.patchDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f46415..a6196cc 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2994,7 +2994,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3015,7 +3016,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3071,6 +3073,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3119,6 +3122,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3161,7 +3165,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..2edcf22 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..9e1da75 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -39,14 +42,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +166,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +603,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +625,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +930,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +942,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +978,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1002,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1016,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1067,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1110,803 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ EquivalenceMember *em = lfirst(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ continue;
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ ListCell *otherCell;
+ foreach (otherCell, ec->ec_members)
+ {
+ if (otherCell == cell)
+ continue;
+
+ EquivalenceMember *other = castNode(EquivalenceMember, lfirst(otherCell));
+ if (equal(other->em_expr, em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ continue;
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it
+ * for redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ ListCell *otherCell;
+ foreach (otherCell, *sources)
+ {
+ if (otherCell == cell)
+ continue;
+
+ RestrictInfo *other = castNode(RestrictInfo, lfirst(otherCell));
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (otherCell == NULL)
+ {
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ ListCell *otherCell;
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise remove references from SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside
+ * some special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, toRemove->relid);
+ sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, toRemove->relid);
+ sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, toRemove->relid);
+ sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, toRemove->relid);
+ }
+
+ // !!!FIXME what about placeholders and upper-level tlists (e.g. for grouping)?
+ // The placeholders apparently work somehow due to the fact that they reference
+ // the same Var objects that we modify to point to the other relation.
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation
+ * instead of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed
+ * to the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already
+ * processed this list, and updated these clauses to reference
+ * the remaining relation, so we can skip them here based on their
+ * relids.
+ *
+ * Likewise, we have already processed the join clauses that join
+ * the removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation
+ * to some third relation. We could delete just delete them and
+ * generate on demand, but sometimes we can't do this because there
+ * is no suitable equality operator (see the handling of ec_broken).
+ * In such cases we are going to use the source clauses, so we have
+ * to correct them too.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * There may be references to the removed rel in other baserels' attr_needed
+ * arrays. Switch them to point to the remaining rel.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The clauses that make a relation unique must be the same for both
+ * relations, or else we won't match the same row on each side of join.
+ *
+ * The lists of matching clauses are ordered as the index columns, so we
+ * just compare the list elements one by one. The varnos are different,
+ * so we copy the clauses and replace all mentions of outer varno with the
+ * inner, so that we can use equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+ return l < r ? 1 : ( l == r ? 0 : -1 );
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /* This optimization won't work for tables that have inheritance children. */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ cell = NULL;
+ next = list_head(*joinlist);
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ Node *node = lfirst(cell);
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef*) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 9b6cc9e..88c338c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -194,7 +194,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -203,6 +203,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..3f73b38 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1585,7 +1585,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 39f5729..1b21a6e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -548,7 +544,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -651,7 +647,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -973,7 +969,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -986,9 +982,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -999,8 +995,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1009,7 +1005,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1035,7 +1031,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1045,7 +1041,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fd2420..4161532 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -515,8 +515,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..6814660 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -271,6 +271,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..ad4a2ad 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,20 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..b9b00cc 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -14,6 +14,7 @@
#ifndef PLANMAIN_H
#define PLANMAIN_H
+#include "optimizer/paths.h"
#include "nodes/plannodes.h"
#include "nodes/relation.h"
@@ -103,13 +104,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 1f53780..578f77a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4321,11 +4321,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4442,6 +4444,137 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
@@ -5885,6 +6018,8 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
Output: j2.id1, j2.id2
(8 rows)
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
@@ -5896,14 +6031,11 @@ create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
explain (costs off) select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
- QUERY PLAN
---------------------------------------------
- Merge Join
- Merge Cond: (j1.id1 = j2.id1)
- Join Filter: (j1.id2 = j2.id2)
- -> Index Scan using j1_id1_idx on j1
- -> Index Scan using j1_id1_idx on j1 j2
-(5 rows)
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Seq Scan on j1 j2
+ Filter: ((id1 IS NOT NULL) AND (id2 IS NOT NULL) AND ((id1 % 1000) = 1))
+(2 rows)
select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 334a4dc..d6bc9ca 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1535,6 +1535,70 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
@@ -1975,6 +2039,9 @@ explain (verbose, costs off)
select * from j1
left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
+
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
Here is a rebased version with some bugfixes.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v9.patchtext/x-patch; name=0001-Remove-self-joins-v9.patchDownload
From 5ad486a16346a0141146c488dac74331da777af8 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Mon, 17 Dec 2018 17:47:18 +0300
Subject: [PATCH] Remove self joins v9
---
src/backend/optimizer/path/indxpath.c | 23 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 939 +++++++++++++++++++++++++++---
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/relation.h | 6 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 13 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 222 ++++++-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 93 +++
14 files changed, 1298 insertions(+), 104 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f46415..a6196cc 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -2994,7 +2994,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3015,7 +3016,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3071,6 +3073,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3119,6 +3122,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3161,7 +3165,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all columns of this index? */
if (c == ind->ncolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 642f951..2edcf22 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..605b9fe 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,12 +22,15 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/predtest.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "optimizer/var.h"
#include "utils/lsyscache.h"
@@ -39,14 +42,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +166,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +603,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +625,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +930,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +942,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +978,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1002,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1016,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1067,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1110,823 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ EquivalenceMember *em = lfirst(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ continue;
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ ListCell *otherCell;
+ foreach (otherCell, ec->ec_members)
+ {
+ if (otherCell == cell)
+ continue;
+
+ EquivalenceMember *other = castNode(EquivalenceMember, lfirst(otherCell));
+ if (equal(other->em_expr, em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ continue;
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it
+ * for redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ ListCell *otherCell;
+ foreach (otherCell, *sources)
+ {
+ if (otherCell == cell)
+ continue;
+
+ RestrictInfo *other = castNode(RestrictInfo, lfirst(otherCell));
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (otherCell == NULL)
+ {
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ ListCell *otherCell;
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside
+ * some special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder
+ * can't be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation
+ * instead of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed
+ * to the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already
+ * processed this list, and updated these clauses to reference
+ * the remaining relation, so we can skip them here based on their
+ * relids.
+ *
+ * Likewise, we have already processed the join clauses that join
+ * the removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation
+ * to some third relation. We could delete just delete them and
+ * generate on demand, but sometimes we can't do this because there
+ * is no suitable equality operator (see the handling of ec_broken).
+ * In such cases we are going to use the source clauses, so we have
+ * to correct them too.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The clauses that make a relation unique must be the same for both
+ * relations, or else we won't match the same row on each side of join.
+ *
+ * The lists of matching clauses are ordered as the index columns, so we
+ * just compare the list elements one by one. The varnos are different,
+ * so we copy the clauses and replace all mentions of outer varno with the
+ * inner, so that we can use equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+ return l < r ? 1 : ( l == r ? 0 : -1 );
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /* This optimization won't work for tables that have inheritance children. */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ cell = NULL;
+ next = list_head(*joinlist);
+ while (next)
+ {
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+
+ Node *node = lfirst(cell);
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef*) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 9b6cc9e..88c338c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -194,7 +194,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -203,6 +203,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d50d86b..3f73b38 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1585,7 +1585,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 39f5729..1b21a6e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -38,14 +38,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -548,7 +544,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -651,7 +647,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -973,7 +969,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -986,9 +982,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -999,8 +995,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1009,7 +1005,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1035,7 +1031,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1045,7 +1041,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fd2420..4161532 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -515,8 +515,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 81abcf5..6814660 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -271,6 +271,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cafde30..ad4a2ad 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,20 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c8ab028..b9b00cc 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -14,6 +14,7 @@
#ifndef PLANMAIN_H
#define PLANMAIN_H
+#include "optimizer/paths.h"
#include "nodes/plannodes.h"
#include "nodes/relation.h"
@@ -103,13 +104,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 1f53780..4a3d21d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4321,11 +4321,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4442,6 +4444,205 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
@@ -5885,6 +6086,8 @@ left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
Output: j2.id1, j2.id2
(8 rows)
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
@@ -5896,14 +6099,11 @@ create index j1_id1_idx on j1 (id1) where id1 % 1000 = 1;
explain (costs off) select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
where j1.id1 % 1000 = 1 and j2.id1 % 1000 = 1;
- QUERY PLAN
---------------------------------------------
- Merge Join
- Merge Cond: (j1.id1 = j2.id1)
- Join Filter: (j1.id2 = j2.id2)
- -> Index Scan using j1_id1_idx on j1
- -> Index Scan using j1_id1_idx on j1 j2
-(5 rows)
+ QUERY PLAN
+----------------------------------------------------------------------------
+ Seq Scan on j1 j2
+ Filter: ((id1 IS NOT NULL) AND (id2 IS NOT NULL) AND ((id1 % 1000) = 1))
+(2 rows)
select * from j1 j1
inner join j1 j2 on j1.id1 = j2.id1 and j1.id2 = j2.id2
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 334a4dc..266141e 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1535,6 +1535,96 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
@@ -1975,6 +2065,9 @@ explain (verbose, costs off)
select * from j1
left join j2 on j1.id1 = j2.id1 where j1.id2 = 1;
+-- !!!FIXME this test doesn't break if I set skip_mark_restore to true.
+-- Also should avoid unique self join removal by using two different relations.
+
-- validate logic in merge joins which skips mark and restore.
-- it should only do this if all quals which were used to detect the unique
-- are present as join quals, and not plain quals.
--
2.7.4
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
Here is a rebased version with some bugfixes.
I noticed this had bit-rotted again. I've not really reviewed it, but
I rebased it up to HEAD, and fixed a couple small things:
* My compiler was bitching about misplaced declarations, so I moved
some variable declarations accordingly. I couldn't help noticing
that many of those wouldn't have been a problem in the first place
if you were following project style for loops around list_delete_cell
calls, which usually look more like this:
prev = NULL;
for (cell = list_head(root->rowMarks); cell; cell = next)
{
PlanRowMark *rc = (PlanRowMark *) lfirst(cell);
next = lnext(cell);
if (rt_fetch(rc->rti, root->parse->rtable)->rtekind == RTE_RESULT)
root->rowMarks = list_delete_cell(root->rowMarks, cell, prev);
else
prev = cell;
}
* I saw you had a problem with an existing test in join.sql that
was being optimized away because it used an ill-advised self-join.
I've pushed a fix for that, so it's not a problem as of HEAD.
I notice though that there's one unexplained plan change remaining
in join.out:
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
That sure looks like a bug. I don't have time to look for the
cause right now.
I also noticed that the test results show that when a table
is successfully optimized away, the remaining reference seems
to have the alias of the second reference not the first one.
That seems a little ... weird. It's just cosmetic of course, but
why is that?
Also, I did notice that you'd stuck a declaration for
"struct UniqueIndexInfo" into paths.h, which then compelled you
to include that header in planmain.h. This seems like poor style;
I'd have been inclined to put the struct in pathnodes.h instead.
That's assuming you need it at all -- in those two usages, seems
like it'd be just about as easy to return two separate Lists.
On the other hand, given
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
I wonder why you didn't include the Relids into UniqueIndexInfo as well
... and maybe make it a proper Node so that unique_for_rels could be
printed by outfuncs.c. So any way I slice it, it seems like this data
structure could use more careful contemplation.
Anyway, updated patch attached.
regards, tom lane
Attachments:
0001-Remove-self-joins-v10.patchtext/x-diff; charset=us-ascii; name=0001-Remove-self-joins-v10.patchDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..86c9453 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueIndexInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,7 +3605,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **index_info)
{
ListCell *ic;
@@ -3660,6 +3662,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3711,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3750,7 +3754,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (index_info != NULL)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *index_info = palloc(sizeof(UniqueIndexInfo));
+ (*index_info)->index = ind;
+ (*index_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d8ff4bf..89cd236 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa69..19e5139 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -39,14 +40,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueIndexInfo **info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueIndexInfo **info);
/*
@@ -58,7 +60,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +526,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,9 +601,13 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueIndexInfo **index_info)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -661,8 +623,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ index_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +928,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueIndexInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +940,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueIndexInfo **index_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueIndexInfo *index_info;
+
+ if (index_info_out)
+ *index_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +976,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ Relids unique_for_rels = (Relids) linitial(lfirst(lc));
if (bms_is_subset(unique_for_rels, outerrelids))
+ {
+ if (index_info_out)
+ *index_info_out = lsecond(lfirst(lc));
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1000,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &index_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1033,9 +1014,12 @@ innerrel_is_unique(PlannerInfo *root,
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ list_make2(bms_copy(outerrelids), index_info));
MemoryContextSwitchTo(old_context);
+ if (index_info_out)
+ *index_info_out = index_info;
+
return true; /* Success! */
}
else
@@ -1081,7 +1065,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueIndexInfo **index_info)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1123,5 +1108,830 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, index_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(ec->ec_members);
+
+ while (next)
+ {
+ EquivalenceMember *em;
+ ListCell *otherCell;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+ em = lfirst(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ continue;
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+ if (equal(other->em_expr, em->em_expr))
+ {
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell = NULL;
+ ListCell *next = list_head(*sources);
+
+ while (next)
+ {
+ RestrictInfo *rinfo;
+ ListCell *otherCell;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+ rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ continue;
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it
+ * for redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ {
+ *sources = list_delete_cell(*sources, cell, prev);
+ cell = prev;
+ break;
+ }
+ }
+
+ if (otherCell == NULL)
+ {
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id.
+ * Each element is a set of relation ids with which this relation
+ * has a special join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the remaining
+ * relation instead of the removed one. The clauses that require a subset of
+ * joinrelids become restriction clauses of the remaining relation, and
+ * others remain join clauses. We append them to baserestrictinfo and
+ * joininfo respectively, trying not to introduce duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it contains
+ * EC-derived join clauses which must become filter clauses. It is not enough
+ * to just correct the ECs, because the EC-derived restrictions are generated
+ * before join removal (see generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ Assert( !(is_join_clause && rinfo->parent_ec != NULL) );
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X=X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside
+ * some special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder
+ * can't be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation,
+ * nothing to be done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation
+ * instead of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed
+ * to the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already
+ * processed this list, and updated these clauses to reference
+ * the remaining relation, so we can skip them here based on their
+ * relids.
+ *
+ * Likewise, we have already processed the join clauses that join
+ * the removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation
+ * to some third relation. We could delete just delete them and
+ * generate on demand, but sometimes we can't do this because there
+ * is no suitable equality operator (see the handling of ec_broken).
+ * In such cases we are going to use the source clauses, so we have
+ * to correct them too.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueIndexInfo *outeridx = NULL;
+ UniqueIndexInfo *inneridx = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outeridx);
+ if (!outeridx)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &inneridx);
+ if (!inneridx)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outeridx->index->indexoid != inneridx->index->indexoid)
+ return false;
+
+ /*
+ * The clauses that make a relation unique must be the same for both
+ * relations, or else we won't match the same row on each side of join.
+ *
+ * The lists of matching clauses are ordered as the index columns, so we
+ * just compare the list elements one by one. The varnos are different,
+ * so we copy the clauses and replace all mentions of outer varno with the
+ * inner, so that we can use equal().
+ */
+ forboth(innerCell, inneridx->clauses, outerCell, outeridx->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+ ListCell *lc;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any relations
+ * that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one,
+ * to simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist.
+ */
+ foreach(lc, scratch->targetlist)
+ change_varno(lfirst(lc), relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+ return l < r ? 1 : ( l == r ? 0 : -1 );
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /* This optimization won't work for tables that have inheritance children. */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /* Limit the number of joins we process to control the quadratic behavior. */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ cell = NULL;
+ next = list_head(*joinlist);
+ while (next)
+ {
+ Node *node;
+
+ prev = cell;
+ cell = next;
+ next = lnext(next);
+ node = lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef*) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ cell = prev;
+ }
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node*) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01..8d6036c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 169e51e..493f425 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1578,7 +1578,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514..fde4118 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a008ae0..4d1e9ac 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -520,8 +520,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 574bb85..7eb59c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040335a..725d2c2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,20 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+/*
+ * UniqueIndexInfo describes a unique index and its corresponding clauses
+ * that guarantee the uniqueness of a relation.
+ */
+typedef struct UniqueIndexInfo
+{
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueIndexInfo;
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueIndexInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 3bbdb5e..e697ff6 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -95,13 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueIndexInfo **index_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 593aec2..230c19c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4486,6 +4488,205 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 34d21d0..4b0ea7b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1570,6 +1570,96 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
Hi Tom,
Thanks for the update.
On 2/22/19 03:25, Tom Lane wrote:
* My compiler was bitching about misplaced declarations, so I moved
some variable declarations accordingly. I couldn't help noticing
that many of those wouldn't have been a problem in the first place
if you were following project style for loops around list_delete_cell
calls, which usually look more like this:
Changed them to conform to project style. After a couple of segfaults I
remembered why I wrote them the way I did: you can't use plain
'continue' with our style of loops and also have to update prev, and I
can't seem to remember to do that.
I notice though that there's one unexplained plan change remaining
in join.out:@@ -4365,11 +4365,13 @@ explain (costs off) select p.* from (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k where p.k = 1 and p.k = 2; - QUERY PLAN --------------------------- + QUERY PLAN +------------------------------------------------ Result One-Time Filter: false -(2 rows) + -> Index Scan using parent_pkey on parent x + Index Cond: (k = 1) +(4 rows)-- bug 5255: this is not optimizable by join removal
begin;That sure looks like a bug. I don't have time to look for the
cause right now.
David also asked about this before. This is the same plan you'd get for
'select * from parent where k = 1 and k = 2', and this plan is exposed
by join removal. So this is not a bug in join removal itself.
I also noticed that the test results show that when a table
is successfully optimized away, the remaining reference seems
to have the alias of the second reference not the first one.
That seems a little ... weird. It's just cosmetic of course, but
why is that?
Can you point me to a query that looks wrong?
Also, I did notice that you'd stuck a declaration for
"struct UniqueIndexInfo" into paths.h, which then compelled you
to include that header in planmain.h. This seems like poor style;
I'd have been inclined to put the struct in pathnodes.h instead.
Moved.
I wonder why you didn't include the Relids into UniqueIndexInfo as well
... and maybe make it a proper Node so that unique_for_rels could be
printed by outfuncs.c.
We also prove and cache the uniqueness for subqueries, for which the
UniqueIndexInfo is not relevant, that's why it was optional and stored
in a parallel list. Now I changed it to UniqueRelInfo which always has
outerrelids and optionally the unique index.
I also fixed a bug with not updating the references in HAVING clause.
New version is attached.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v11-0001-Remove-self-joins-on-a-unique-column.patchtext/x-patch; name=v11-0001-Remove-self-joins-on-a-unique-column.patchDownload
From 5430a9ef35f5e21dade56fc5573ed7f377dce757 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Thu, 28 Feb 2019 17:06:45 +0300
Subject: [PATCH v11] Remove self joins on a unique column
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 26 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 968 +++++++++++++++++++++++++++---
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 21 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 4 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 219 ++++++-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 94 +++
16 files changed, 1359 insertions(+), 99 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 65302fe..5ffe3ba 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2261,7 +2261,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2334,6 +2335,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(clauses);
+ }
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4085,6 +4099,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..7b259f0 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *matched_restrictlist = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ matched_restrictlist = lappend(matched_restrictlist, rinfo);
break;
}
}
@@ -3750,7 +3757,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->clauses = list_copy(matched_restrictlist);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
return true;
+ }
+ if (matched_restrictlist)
+ list_free(matched_restrictlist);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d8ff4bf..89cd236 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa69..130a2b8 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -39,14 +40,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +60,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +526,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,10 +601,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -661,8 +626,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +931,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +943,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +979,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1003,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1032,10 +1016,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1081,11 +1070,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1123,5 +1116,842 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X = X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ ListCell *outerCell, *innerCell;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * The clauses that make a relation unique must be the same for both
+ * relations, or else we won't match the same row on each side of join.
+ *
+ * The lists of matching clauses are ordered as the index columns, so we
+ * just compare the list elements one by one. The varnos are different, so
+ * we copy the clauses and replace all mentions of outer varno with the
+ * inner, so that we can use equal().
+ */
+ forboth(innerCell, innerinfo->clauses, outerCell, outerinfo->clauses)
+ {
+ Expr *innerExpr = copyObject(castNode(RestrictInfo, lfirst(innerCell))->clause);
+ Expr *outerExpr = copyObject(castNode(RestrictInfo, lfirst(outerCell))->clause);
+ change_varno(outerExpr, outer->relid, inner->relid);
+ change_varno(innerExpr, outer->relid, inner->relid);
+ if (!equal(outerExpr, innerExpr))
+ {
+ pfree(outerExpr);
+ pfree(innerExpr);
+ return false;
+ }
+ pfree(outerExpr);
+ pfree(innerExpr);
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) scratch->targetlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01..8d6036c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 169e51e..493f425 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1578,7 +1578,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514..fde4118 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index f938925..9b04f6c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a008ae0..918049b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -520,8 +520,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2466,4 +2469,18 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index and its corresponding clauses
+ * are saved in index and clauses fields.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 574bb85..7eb59c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040335a..7e09a20 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,9 +71,11 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 3bbdb5e..291d871 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -95,13 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 593aec2..3e57fc5 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4486,6 +4488,217 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 34d21d0..9f45941 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1570,6 +1570,100 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
--
2.7.4
On Fri, 1 Mar 2019 at 03:09, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
New version is attached.
I started looking over v11 and I'm wondering why you really need to
know which unique index proved the join unique?
I removed that check and I see it causes the following to remove the self-join:
create unique index on t1(a);
create unique index on t1(b);
explain select * from t1 inner join t1 t2 on t1.a=t2.b;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 t2 (cost=0.00..38.25 rows=11 width=16)
Filter: (a = b)
(2 rows)
I had thought that the loop in is_unique_self_join() would have
rejected this, but looking more closely it's only checking the list of
join quals matched to each index match each other. Shouldn't it be
testing that the expressions on either side of the OpExprs are the
same after aligning the varnos? At the moment you're testing
innerrel_is_unique() for either side of the join, if you made this
change then couldn't you just test one rel and look for a unique index
on the join quals that match both sides of the join at the same time?
Probably relation_has_unique_index_for() is not the right tool for
that job, so you might need to write another function that does the
additional checks. Maybe it would be possible to split out some of the
code into helper functions so you don't have to copy the bulk of it.
The reason I mention this is that rel_is_distinct_for() also
considered subqueries. This allows the LEFT JOIN removal code to
remove joins like:
SELECT t1.* FROM t1 LEFT JOIN (SELECT DISTINCT a FROM t2) t2 on t1.a=t2.a;
In this case, the DISTINCT clause was used for unique proofs, not a
unique index.
It does not seem unreasonable that someone one day might want to
extend the self join removal code to do the same, e.g:
SELECT * FROM t1 INNER JOIN (SELECT DISTINCT a FROM t1) t2 on t1.a = t2.a;
and you're not setting it up to be very easy to do that because you're
insisting the proofs are unique indexes only.
What do you think?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 3/14/19 14:21, David Rowley wrote:
What do you think?
Let's recap the conditions when we can remove a self-join. It is when
for each outer row, 1) at most one inner row matches the join clauses,
and 2) it is the same row as the outer one. I'm not sure what (2) means
precisely in a general case, but for a plain table, we can identify
these rows by ctid. So when both sides have the same unique index with
the same clauses, we conclude that we are always dealing with the same
row (as identified by ctid) on both sides, hence the join can be
replaced with a scan.
The code I wrote just checks for the above conditions. The data we need
for these checks is a byproduct of checking the relations for
uniqueness, which we do anyway, so we just cache it for a negligible cost.
I didn't write it in a more generic way because I don't understand the
conditions for generic case. In your DISTINCT example, the join can be
removed indeed. But if we select some columns from the inner side apart
from the join ones, we can't remove the join anymore:
select * from t1, (select distinct on (a) a, b from t1) tt where t1.a =
tt.a;
I think this might be a different kind of optimization, where we remove
the self-join if the inner side is unique, and no inner columns are
selected besides the join ones.
Also, reading your letter I realized that I don't commute the index
clauses correctly before comparing them in is_unique_self_join, so I
fixed this in the new version of the patch.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v12-0001-Remove-unique-self-joins.patchtext/x-patch; name=v12-0001-Remove-unique-self-joins.patchDownload
From d3bb27a26919441df1c31645ff39655c124e5ae6 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Wed, 20 Mar 2019 15:15:33 +0300
Subject: [PATCH v12] Remove unique self joins.
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 965 +++++++++++++++++++++++++++---
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 4 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 219 ++++++-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 94 +++
16 files changed, 1359 insertions(+), 99 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69179a0..7ed1f47 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2263,7 +2263,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2336,6 +2337,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4088,6 +4102,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..88e61d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3750,7 +3759,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d8ff4bf..89cd236 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa69..a310d89 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -39,14 +40,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +60,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +526,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,10 +601,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -661,8 +626,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +931,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +943,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +979,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1003,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1032,10 +1016,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1081,11 +1070,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1123,5 +1116,839 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X = X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) scratch->targetlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01..8d6036c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 169e51e..493f425 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1578,7 +1578,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514..fde4118 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffb4cd4..59ca1fe 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 253e0b7..0592b64 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -522,8 +522,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2471,4 +2474,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 574bb85..7eb59c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 36d12bc..4260227 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -70,9 +70,11 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 3bbdb5e..291d871 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -95,13 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 88fcd52..38e3842 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4486,6 +4488,217 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c247509..001b41b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1570,6 +1570,100 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
--
2.7.4
On Thu, 21 Mar 2019 at 01:20, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
Let's recap the conditions when we can remove a self-join. It is when
for each outer row, 1) at most one inner row matches the join clauses,
and 2) it is the same row as the outer one. I'm not sure what (2) means
precisely in a general case, but for a plain table, we can identify
these rows by ctid. So when both sides have the same unique index with
the same clauses, we conclude that we are always dealing with the same
row (as identified by ctid) on both sides, hence the join can be
replaced with a scan.The code I wrote just checks for the above conditions. The data we need
for these checks is a byproduct of checking the relations for
uniqueness, which we do anyway, so we just cache it for a negligible cost.I didn't write it in a more generic way because I don't understand the
conditions for generic case. In your DISTINCT example, the join can be
removed indeed. But if we select some columns from the inner side apart
from the join ones, we can't remove the join anymore:select * from t1, (select distinct on (a) a, b from t1) tt where t1.a =
tt.a;I think this might be a different kind of optimization, where we remove
the self-join if the inner side is unique, and no inner columns are
selected besides the join ones.Also, reading your letter I realized that I don't commute the index
clauses correctly before comparing them in is_unique_self_join, so I
fixed this in the new version of the patch.
I really just don't think checking the unique indexes match is going
to cut it. You should be looking for a unique index where the join
clauses match on either side of the join, not looking independently
and then checking the indexes are the same ones.
Here's an example of what can go wrong with your current code:
drop table abc;
create table abc(a int, b int, c int);
create unique index on abc(a);
create unique index on abc(b);
create unique index on abc(c);
explain select * from abc a1 inner join abc a2 on a1.a=a2.b and a1.c=a2.c;
QUERY PLAN
---------------------------------------------------------
Seq Scan on abc a2 (cost=0.00..35.50 rows=10 width=24)
Filter: ((c IS NOT NULL) AND (a = b))
(2 rows)
The above seems fine, but let's try again, this time change the order
that the indexes are defined.
drop table abc;
create table abc(a int, b int, c int);
create unique index on abc(a);
create unique index on abc(c);
create unique index on abc(b);
explain select * from abc a1 inner join abc a2 on a1.a=a2.b and a1.c=a2.c;
QUERY PLAN
-----------------------------------------------------------------------
Hash Join (cost=61.00..102.11 rows=1 width=24)
Hash Cond: ((a1.a = a2.b) AND (a1.c = a2.c))
-> Seq Scan on abc a1 (cost=0.00..30.40 rows=2040 width=12)
-> Hash (cost=30.40..30.40 rows=2040 width=12)
-> Seq Scan on abc a2 (cost=0.00..30.40 rows=2040 width=12)
(5 rows)
oops. I think behaviour like this that depends on the order that
indexes are created is not going to cut it. Probably you maybe could
restrict the join qual list to just quals that have the same expr on
either side of the join, that way you could still use
innerrel_is_unique() to check the inner rel is unique. You'll likely
need to pass force_cache as false though, since you don't want to
cache non-uniqueness with a subset of joinquals. Doing that could
cause unique joins not to work when the join search is done via GEQO.
I also think this way would give you the subquery GROUP BY / DISTINCT
self join removal for just about free. However, there might be more
cleanup to be done in that case... I've not thought about that too
hard.
I'm going to set this to waiting on author, as that's quite a decent
sized change.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 3/21/19 01:54, David Rowley wrote:
I really just don't think checking the unique indexes match is going
to cut it. You should be looking for a unique index where the join
clauses match on either side of the join, not looking independently
and then checking the indexes are the same ones.
The bug you mention later is an implementation bug that can be fixed (I
will expand on that below). Besides this, do you think current self-join
detection algorithm has fundamental correctness problems? I am not aware
of such problems and this algorithm reflects my understanding of what
constitutes a removable self-join, so it would be helpful if you could
explain what exactly is wrong with it.
Your alternative idea sounds plausible, but I won't know it's correct
for sure until I implement it, which is a nontrivial amount of work. I
am also concerned that we will have to redo calculations similar to
innerrel_is_unique(), because having near-zero overhead is a hard
prerequisite for this patch, as was discussed upthread.
In short, I am reluctant to implement a new approach to detection until
I understand why the current one is fundamentally broken.
Here's an example of what can go wrong with your current code:
This is a bug indeed. Unique index search is not exhaustive, so if many
indexes match the join quals, we might not find the same index for both
sides. I think this can be overcome if we switch to exhaustive search in
relation_has_unique_index_for, and then try to find matching indexes in
is_unique_self_join. Please see the new patch for the fix.
I also think this way would give you the subquery GROUP BY / DISTINCT
self join removal for just about free.
Could you explain how exactly we can generalize join removal to the
DISTINCT/GROUP BY case? I understand the removable self-joins as having
the same row on both sides of join, as identified by ctid, but I'm not
sure how to apply this to subqueries. Your earlier DISTINCT example
looked like it needed a different kind of join removal, with different
conditions for when it applies (please see my previous letter for details).
Another thing I'd like to mention is that this patch splits in two
independent parts, the detection of self-joins and their removal. While
we are looking for agreement on the detection part, could you also
review the removal part? I'm sure it has its own share of problems.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v13-0001-Remove-unique-self-joins.patchtext/x-patch; name=v13-0001-Remove-unique-self-joins.patchDownload
From c40b4b59df27e24c2238f422d5c6fde6fe1a91d3 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Fri, 22 Mar 2019 17:25:01 +0300
Subject: [PATCH v13] Remove unique self joins.
---
src/backend/nodes/outfuncs.c | 14 +-
src/backend/optimizer/path/indxpath.c | 46 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 998 ++++++++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/plancat.c | 7 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/cache/relcache.c | 4 +-
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 25 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 4 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/insert_conflict.out | 2 +-
src/test/regress/expected/join.out | 255 ++++++-
src/test/regress/expected/stats_ext.out | 23 +-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 108 +++
20 files changed, 1478 insertions(+), 115 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 910a738..ba03fdf 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2270,7 +2270,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2343,6 +2344,14 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4095,6 +4104,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..096e786 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,24 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
+ /*
+ * If we are asked to return the information about unique index, do not
+ * stop after finding one index that guarantees uniqueness, and try to find
+ * several of them. This info is useful for self-join removal. To save
+ * time, don't try to find all of them, and stop after we have found some.
+ */
+ int indexesNeeded = unique_info ? URI_MAX_INDEXES : 1;
+ int indexesFound = 0;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3660,6 +3673,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3722,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3750,10 +3767,31 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
- return true;
+ {
+ indexesFound++;
+
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ if (!*unique_info)
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->n_indexes = indexesFound;
+ (*unique_info)->unique_indexes[indexesFound - 1] = ind->indexoid;
+ (*unique_info)->column_values[indexesFound - 1]
+ = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ }
+
+ if (column_values)
+ list_free(column_values);
+
+ if (indexesFound >= indexesNeeded)
+ break;
}
- return false;
+ return indexesFound > 0;
}
/*
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d8ff4bf..89cd236 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa69..54220e9 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -39,14 +40,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +60,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +526,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,10 +601,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -661,8 +626,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +931,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +943,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +979,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1003,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1032,10 +1016,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1081,11 +1070,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1123,5 +1116,872 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X = X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test for unique self-join.
+ *
+ * We prove that for every pair of joined rows, we actually have the same row
+ * on inner and outer side, as identified by ctid. This happens when for both
+ * sides of join we can apply the same unique index with the same index
+ * conditions.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerInfo = NULL;
+ UniqueRelInfo *innerInfo = NULL;
+ int outerPos = 0, innerPos = 0;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerInfo);
+ if (!outerInfo || !outerInfo->n_indexes)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerInfo);
+ if (!innerInfo || !innerInfo->n_indexes)
+ return false;
+
+ /* Find the same unique index for both relations. */
+ for (;;)
+ {
+ Oid outerOid, innerOid;
+ List *outerValues, *innerValues;
+ bool haveSameValues;
+
+ if (outerPos >= outerInfo->n_indexes)
+ break;
+
+ if (innerPos >= innerInfo->n_indexes)
+ break;
+
+ /*
+ * The indexes are sorted in ascending order by Oid, as guaranteed by
+ * RelationGetIndexList.
+ */
+ outerOid = outerInfo->unique_indexes[outerPos];
+ innerOid = innerInfo->unique_indexes[innerPos];
+
+ if (outerOid > innerOid)
+ {
+ innerPos++;
+ continue;
+ }
+ if (outerOid < innerOid)
+ {
+ outerPos++;
+ continue;
+ }
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal given
+ * values. These values must be the same for both relations, or else we
+ * won't match the same row on each side of join. A value may be either
+ * Const or Var of some other relation. For the purposes of this proof,
+ * the Vars of the inner and outer relation are the same, so we replace
+ * outer varno with inner and compare the column values using equal().
+ */
+ innerValues = copyObject(innerInfo->column_values[innerPos]);
+ outerValues = copyObject(outerInfo->column_values[outerPos]);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ haveSameValues = equal(outerValues, innerValues);
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ if (haveSameValues)
+ return true;
+
+ outerPos++;
+ innerPos++;
+ }
+ return false;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) scratch->targetlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01..8d6036c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 169e51e..493f425 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1578,7 +1578,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 30f4dc1..3d50ea2 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -420,7 +420,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
index_close(indexRelation, NoLock);
- indexinfos = lcons(info, indexinfos);
+ /*
+ * Keep the list of IndexOptInfo sorted in the same order as the
+ * Oids returned by RelationGetIndexList. This ordering is used
+ * to simplify index matching for self-join removal.
+ */
+ indexinfos = lappend(indexinfos, info);
}
list_free(indexoidlist);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514..fde4118 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 84609e0..3bf5bff 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4298,7 +4298,9 @@ RelationGetFKeyList(Relation relation)
* needed by the executor, since for index types that we obtain exclusive
* locks on when updating the index, all backends must lock the indexes in
* the same order or we will get deadlocks (see ExecOpenIndices()). Any
- * consistent ordering would do, but ordering by OID is easy.
+ * consistent ordering would do, but ordering by OID is easy. We also rely on
+ * the resulting ordering of IndexOptInfos to simplify index matching for
+ * self-join removal.
*
* Since shared cache inval causes the relcache's copy of the list to go away,
* we return a copy of the list palloc'd in the caller's context. The caller
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffb4cd4..59ca1fe 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 253e0b7..f6c358e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -522,8 +522,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2471,4 +2474,22 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness was proved
+ * using a unique index, this index is recorded along with the values that
+ * constrain index columns. If there are other unique indexes that can
+ * guarantee uniqueness, they are also saved, up to URI_MAX_INDEXES. The
+ * indexes are ordered by ascending Oid.
+ */
+#define URI_MAX_INDEXES 5
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ int n_indexes;
+ Oid unique_indexes[URI_MAX_INDEXES];
+ List *column_values[URI_MAX_INDEXES];
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 574bb85..7eb59c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 36d12bc..4260227 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -70,9 +70,11 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index b093a3c..252b8da 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -95,13 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out
index 1338b2b..306dbd1 100644
--- a/src/test/regress/expected/insert_conflict.out
+++ b/src/test/regress/expected/insert_conflict.out
@@ -53,7 +53,7 @@ explain (costs off) insert into insertconflicttest values(0, 'Crowberry') on con
Conflict Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
-> Result
SubPlan 1
- -> Index Only Scan using both_index_expr_key on insertconflicttest ii
+ -> Index Only Scan using op_index_key on insertconflicttest ii
Index Cond: (key = excluded.key)
SubPlan 2
-> Seq Scan on insertconflicttest ii_1
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 88fcd52..13aa5ba 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4486,6 +4488,253 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+-- Check that we can cope with multiple matching unique indexes on each side
+create table sm(a int unique, b int unique, c int unique);
+explain (costs off)
+select * from sm m, sm n where m.a = n.b and m.c = n.c;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sm n
+ Filter: ((c IS NOT NULL) AND (a = b))
+(2 rows)
+
+explain (costs off)
+select * from sm m, sm n where m.a = n.c and m.b = n.b;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sm n
+ Filter: ((b IS NOT NULL) AND (a = c))
+(2 rows)
+
+explain (costs off)
+select * from sm m, sm n where m.c = n.b and m.a = n.a;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sm n
+ Filter: ((a IS NOT NULL) AND (c = b))
+(2 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 054a381..6ba3b0a 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -404,7 +404,7 @@ EXPLAIN (COSTS OFF)
---------------------------------------------------
Bitmap Heap Scan on functional_dependencies
Recheck Cond: ((a = 1) AND (b = '1'::text))
- -> Bitmap Index Scan on fdeps_abc_idx
+ -> Bitmap Index Scan on fdeps_ab_idx
Index Cond: ((a = 1) AND (b = '1'::text))
(4 rows)
@@ -425,7 +425,7 @@ EXPLAIN (COSTS OFF)
---------------------------------------------------
Bitmap Heap Scan on functional_dependencies
Recheck Cond: ((a = 1) AND (b = '1'::text))
- -> Bitmap Index Scan on fdeps_abc_idx
+ -> Bitmap Index Scan on fdeps_ab_idx
Index Cond: ((a = 1) AND (b = '1'::text))
(4 rows)
@@ -445,19 +445,20 @@ INSERT INTO functional_dependencies (a, b, c, filler1)
ANALYZE functional_dependencies;
EXPLAIN (COSTS OFF)
SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1';
- QUERY PLAN
------------------------------------------------------------
- Index Scan using fdeps_abc_idx on functional_dependencies
+ QUERY PLAN
+----------------------------------------------------------
+ Index Scan using fdeps_ab_idx on functional_dependencies
Index Cond: ((a = 1) AND (b = '1'::text))
(2 rows)
EXPLAIN (COSTS OFF)
SELECT * FROM functional_dependencies WHERE a = 1 AND b = '1' AND c = 1;
- QUERY PLAN
------------------------------------------------------------
- Index Scan using fdeps_abc_idx on functional_dependencies
- Index Cond: ((a = 1) AND (b = '1'::text) AND (c = 1))
-(2 rows)
+ QUERY PLAN
+----------------------------------------------------------
+ Index Scan using fdeps_ab_idx on functional_dependencies
+ Index Cond: ((a = 1) AND (b = '1'::text))
+ Filter: (c = 1)
+(3 rows)
-- create statistics
CREATE STATISTICS func_deps_stat (dependencies) ON a, b, c FROM functional_dependencies;
@@ -468,7 +469,7 @@ EXPLAIN (COSTS OFF)
---------------------------------------------------
Bitmap Heap Scan on functional_dependencies
Recheck Cond: ((a = 1) AND (b = '1'::text))
- -> Bitmap Index Scan on fdeps_abc_idx
+ -> Bitmap Index Scan on fdeps_ab_idx
Index Cond: ((a = 1) AND (b = '1'::text))
(4 rows)
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c247509..c2937ed 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1570,6 +1570,114 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check that we can cope with multiple matching unique indexes on each side
+create table sm(a int unique, b int unique, c int unique);
+explain (costs off)
+select * from sm m, sm n where m.a = n.b and m.c = n.c;
+explain (costs off)
+select * from sm m, sm n where m.a = n.c and m.b = n.b;
+explain (costs off)
+select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
--
2.7.4
On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
The bug you mention later is an implementation bug that can be fixed (I
will expand on that below). Besides this, do you think current self-join
detection algorithm has fundamental correctness problems? I am not aware
of such problems and this algorithm reflects my understanding of what
constitutes a removable self-join, so it would be helpful if you could
explain what exactly is wrong with it.
I did explain what is wrong with it, and also showed an example of why
it is broken. I didn't see anything that looks fundamentally broken,
just the implementation needs more work.
Your alternative idea sounds plausible, but I won't know it's correct
for sure until I implement it, which is a nontrivial amount of work. I
am also concerned that we will have to redo calculations similar to
innerrel_is_unique(), because having near-zero overhead is a hard
prerequisite for this patch, as was discussed upthread.
Are you worried about bypassing the unique rel cache is going to be too costly?
Here's an example of what can go wrong with your current code:
This is a bug indeed. Unique index search is not exhaustive, so if many
indexes match the join quals, we might not find the same index for both
sides. I think this can be overcome if we switch to exhaustive search in
relation_has_unique_index_for, and then try to find matching indexes in
is_unique_self_join. Please see the new patch for the fix.
I really don't think modifying relation_has_unique_index_for to
collect details for up to 5 indexes is a good fix for this. It looks
like it's still possible to trigger this, just the example would need
to be more complex. Also, you've likely just more than doubled the
runtime of a successful match in relation_has_unique_index_for().
Previously, on average we'd have found that match halfway through the
list of indexes. Now it's most likely you'll end up looking at every
index, even after a match has been found. That does not seem well
aligned to keeping the CPU overhead for the patch low.
What is wrong with just weeding out join quals that don't have the
equivalent expression on either side before passing them to
relation_has_unique_index_for()? That'll save you from getting false
matches like I showed. To make that work it just seems mostly like
you'd mostly just need to swap the order of operations in the patch,
but you'd also likely end up needing to rip out all the UniqueRelInfo
code, since I don't think that'll be needed any longer. Likely that
means your entire patch would be limited to analyzejoins.c, although
I'm unsure what of the eclass editing code should be moved into
equivclass.c.
I also think this way would give you the subquery GROUP BY / DISTINCT
self join removal for just about free.Could you explain how exactly we can generalize join removal to the
DISTINCT/GROUP BY case? I understand the removable self-joins as having
the same row on both sides of join, as identified by ctid, but I'm not
sure how to apply this to subqueries. Your earlier DISTINCT example
looked like it needed a different kind of join removal, with different
conditions for when it applies (please see my previous letter for details).
Well, by removing the requirement that the unique proofs have to come
from a unique index. I don't think you need to ensure this works for
your patch, it would be nice if it did for free. I just don't think
your implementation should block it from ever working.
Another thing I'd like to mention is that this patch splits in two
independent parts, the detection of self-joins and their removal. While
we are looking for agreement on the detection part, could you also
review the removal part? I'm sure it has its own share of problems.
I'd rather focus on the detection method before reviewing the removal
code. If there's some blocker in the detection code then the removal
code is not useful.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, 23 Mar 2019 at 04:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov
This is a bug indeed. Unique index search is not exhaustive, so if many
indexes match the join quals, we might not find the same index for both
sides. I think this can be overcome if we switch to exhaustive search in
relation_has_unique_index_for, and then try to find matching indexes in
is_unique_self_join. Please see the new patch for the fix.I really don't think modifying relation_has_unique_index_for to
collect details for up to 5 indexes is a good fix for this. It looks
like it's still possible to trigger this, just the example would need
to be more complex. Also, you've likely just more than doubled the
runtime of a successful match in relation_has_unique_index_for().
Previously, on average we'd have found that match halfway through the
list of indexes. Now it's most likely you'll end up looking at every
index, even after a match has been found. That does not seem well
aligned to keeping the CPU overhead for the patch low.What is wrong with just weeding out join quals that don't have the
equivalent expression on either side before passing them to
relation_has_unique_index_for()? That'll save you from getting false
matches like I showed. To make that work it just seems mostly like
you'd mostly just need to swap the order of operations in the patch,
but you'd also likely end up needing to rip out all the UniqueRelInfo
code, since I don't think that'll be needed any longer. Likely that
means your entire patch would be limited to analyzejoins.c, although
I'm unsure what of the eclass editing code should be moved into
equivclass.c.
I've done some minimal modifications to your v12 patch to make it work
the way I described. I ended up splitting out the joinqual list into
two lists; one that contains just self join quals that match on either
side, and the other with the remaining quals. We just pass the self
join matching quals to innerrel_is_unique() so we no longer can trick
it into matching with the wrong index.
I didn't modify remove_self_join_rel(). It could make use of the split
lists instead of checking what we've already checked in
split_selfjoin_quals(). i.e. selfjoinquals get a IS NOT NULL test
added to the basequals and the otherjoinquals have their varnos
changed and then applied to the basequals too.
There was a couple of regression test failures using your version of
the tests. One test just went back to what it was before you changed
the output and the other seems like a missed optimisation in your
version of the patch.
Namely, you didn't remove the self join in a case like:
explain select * from t1 inner join t1 t2 on t1.a=t2.a where t1.b = 1
and t2.b = 2;
but my version does. You had commented the test with:
-- If index conditions are different for each side, we won't select the same
-- row on both sides, so the join can't be removed.
but I don't quite understand why we can't remove the join in this
situation. For this particular case no rows can match, so maybe the
plan should really be the same as what happens when you do:
# explain select * from t1 where b = 1 and b = 2;
QUERY PLAN
-------------------------------------------------------------------------
Result (cost=0.15..8.17 rows=1 width=8)
One-Time Filter: false
-> Index Scan using t1_b_key on t1 (cost=0.15..8.17 rows=1 width=8)
Index Cond: (b = 1)
(4 rows)
For now, it still produces a plan like:
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using t1_b_key on t1 t2 (cost=0.15..8.17 rows=1 width=16)
Index Cond: ((b = 2) AND (b = 1))
Filter: (a IS NOT NULL)
(3 rows)
My implementation of split_selfjoin_quals() still needs work. I only
wrote code to handle simple Vars. There could be Unique indexes on
exprs too, which the current code will fail to detect. I don't think
the join quals can contain vars from higher levels, but I didn't go to
the trouble of checking that in detail. If it's possible then we'll
need to reject those.
I also finished off the renaming of remove_useless_left_joins(). I
didn't go into any detail on the actual removal code.
I've attached a complete patch and also a delta against v12.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
v14-remove-unique-self-joins.patchapplication/octet-stream; name=v14-remove-unique-self-joins.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 910a738c20..0463c477e7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2270,7 +2270,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2342,6 +2343,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4095,6 +4109,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219dbd..88e61d4c54 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3750,7 +3759,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d8ff4bf432..89cd236306 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa693ea..a310d89a39 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -39,14 +40,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +60,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +526,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,10 +601,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -661,8 +626,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +931,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +943,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +979,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1003,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1032,10 +1016,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1081,11 +1070,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1123,5 +1116,839 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X = X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) scratch->targetlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01c98..8d6036ca7f 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -232,6 +232,11 @@ query_planner(PlannerInfo *root, List *tlist,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 169e51e792..493f4255dd 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1578,7 +1578,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514952..fde4118257 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffb4cd4bcc..59ca1fe40e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 253e0b7e48..0592b64bb7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -522,8 +522,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2471,4 +2474,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 574bb85b50..7eb59c1c1e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 36d12bc376..4260227b9f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -70,9 +70,11 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index b093a3c8ac..252b8da686 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -95,13 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..a57905f3ab 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 88fcd52ae1..38e3842278 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4485,6 +4487,217 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Bitmap Heap Scan on sl t1
+ Recheck Cond: (b = 1)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 1)
+ -> Materialize
+ -> Bitmap Heap Scan on sl t2
+ Recheck Cond: (b = 2)
+ -> Bitmap Index Scan on sl_ab
+ Index Cond: (b = 2)
+(11 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..b1e248313a 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c247509a56..001b41b8b7 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1569,6 +1569,100 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- Check that attr_needed is updated correctly after self-join removal. In
+-- this test, k1.b is required at either j1 or j2. If this info is lost,
+-- join targetlist for (k1, k2) will not contain k1.b. Use index scan for
+-- k1 so that we don't get 'b' from physical tlist used for seqscan. Also
+-- disable reordering of joins because this test depends on a particular
+-- join tree.
+create table sk (a int, b int);
+create index sk_a_idx on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- If index conditions are different for each side, we won't select the same
+-- row on both sides, so the join can't be removed.
+create table sl(a int, b int);
+create unique index sl_ab on sl(a, b);
+
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
v14-remove-unique-self-joins_delta.patchapplication/octet-stream; name=v14-remove-unique-self-joins_delta.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0463c477e7..910a738c20 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2270,8 +2270,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- WRITE_NODE_FIELD(unique_for_rels);
- /* can't print non_unique_for_rels; BMSes aren't Nodes */
+ /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2343,19 +2342,6 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
-static void
-_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
-{
- WRITE_NODE_TYPE("UNIQUERELINFO");
-
- WRITE_BITMAPSET_FIELD(outerrelids);
- if (node->index)
- {
- WRITE_OID_FIELD(index->indexoid);
- WRITE_NODE_FIELD(column_values);
- }
-}
-
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4109,9 +4095,6 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
- case T_UniqueRelInfo:
- _outUniqueRelInfo(str, obj);
- break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 88e61d4c54..3434219dbd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,8 +3583,7 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index. If index_info is not null, it is set to
- * point to a new UniqueRelInfo containing the index and conditions.
+ * columns of some unique index.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3605,16 +3604,12 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist,
- UniqueRelInfo **unique_info)
+ List *exprlist, List *oprlist)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
- if (unique_info)
- *unique_info = NULL;
-
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3665,7 +3660,6 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
- List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3714,9 +3708,6 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
- column_values = lappend(column_values, rinfo->outer_is_left
- ? get_leftop(rinfo->clause)
- : get_rightop(rinfo->clause));
break;
}
}
@@ -3759,22 +3750,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
- {
- if (unique_info)
- {
- /* This may be called in GEQO memory context. */
- MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
- *unique_info = makeNode(UniqueRelInfo);
- (*unique_info)->index = ind;
- (*unique_info)->column_values = list_copy(column_values);
- MemoryContextSwitchTo(oldContext);
- }
- if (column_values)
- list_free(column_values);
return true;
- }
- if (column_values)
- list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 89cd236306..d8ff4bf432 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,8 +176,7 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false,
- NULL /*index_info*/);
+ false);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -186,8 +185,7 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false,
- NULL /*index_info*/);
+ false);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a310d89a39..7efcf9f569 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -34,27 +34,29 @@
#include "utils/lsyscache.h"
/* local functions */
-static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
Relids joinrelids);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list, UniqueRelInfo **unique_info);
+ List *clause_list);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist,
- UniqueRelInfo **unique_info);
+ List *restrictlist);
+static void split_selfjoin_quals(PlannerInfo *root, List *joinquals,
+ List **selfjoinquals,
+ List **otherjoinquals);
/*
- * remove_useless_joins
- * Check for relations that don't actually need to be joined at all,
- * and remove them from the query.
+ * remove_useless_left_joins
+ * Check for left joined relations that don't actually need to be joined
+ * at all, and remove them from the query.
*
* We are passed the current joinlist and return the updated list. Other
* data structures that have to be updated are accessible via "root".
@@ -76,11 +78,11 @@ restart:
int nremoved;
/* Skip if not removable */
- if (!join_is_removable(root, sjinfo))
+ if (!leftjoin_is_removable(root, sjinfo))
continue;
/*
- * Currently, join_is_removable can only succeed when the sjinfo's
+ * Currently, leftjoin_is_removable can only succeed when the sjinfo's
* righthand is a single baserel. Remove that rel from the query and
* joinlist.
*/
@@ -148,9 +150,9 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
}
/*
- * join_is_removable
- * Check whether we need not perform this special join at all, because
- * it will just duplicate its left input.
+ * leftjoin_is_removable
+ * Check whether we need not perform this left join at all, because it will
+ * just duplicate its left input.
*
* This is true for a left join for which the join condition cannot match
* more than one inner-side row. (There are other possibly interesting
@@ -159,7 +161,7 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
* above the join.
*/
static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
{
int innerrelid;
RelOptInfo *innerrel;
@@ -253,8 +255,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
- innerrel, sjinfo->jointype, innerrel->joininfo,
- NULL /*unique_index*/);
+ innerrel, sjinfo->jointype, innerrel->joininfo);
}
/*
@@ -339,7 +340,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
* joinrelids. However, a PHV used at a partner rel could not have the
* target rel in ph_eval_at, so we check that while deciding whether to
* remove or just update the PHV. There is no corresponding test in
- * join_is_removable because it doesn't need to distinguish those cases.
+ * leftjoin_is_removable because it doesn't need to distinguish those
+ * cases.
*/
for (l = list_head(root->placeholder_list); l != NULL; l = nextl)
{
@@ -526,7 +528,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
+ JOIN_SEMI, restrictlist, true))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -601,17 +603,10 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
- *
- * If unique_index is not null, it is set to point to the index that guarantees
- * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
- UniqueRelInfo **unique_info)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
- if (unique_info)
- *unique_info = NULL;
-
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -626,8 +621,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
- unique_info);
+ if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -931,10 +926,6 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
- *
- * If index_info_out is not null, it is set to point to a new UniqueRelInfo
- * allocated in root memory context, that describes the index that guarantees
- * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -943,23 +934,12 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache,
- UniqueRelInfo **unique_info_out)
+ bool force_cache)
{
MemoryContext old_context;
ListCell *lc;
- UniqueRelInfo *unique_info;
-
- if (unique_info_out)
- *unique_info_out = NULL;
- /*
- * It is possible to prove uniqueness even in the absence of joinclauses,
- * just from baserestrictinfos alone. However, in these cases the inner
- * relation returns one row at most, so join removal won't give much
- * benefit. It seems better to save some planning time by ignoring these
- * cases.
- */
+ /* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
return false;
@@ -979,14 +959,10 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- unique_info = (UniqueRelInfo *) lfirst(lc);
+ Relids unique_for_rels = (Relids) lfirst(lc);
- if (bms_is_subset(unique_info->outerrelids, outerrelids))
- {
- if (unique_info_out)
- *unique_info_out = unique_info;
+ if (bms_is_subset(unique_for_rels, outerrelids))
return true; /* Success! */
- }
}
/*
@@ -1003,7 +979,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist, &unique_info))
+ jointype, restrictlist))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1016,15 +992,10 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- if (!unique_info)
- unique_info = makeNode(UniqueRelInfo);
- unique_info->outerrelids = bms_copy(outerrelids);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
+ bms_copy(outerrelids));
MemoryContextSwitchTo(old_context);
- if (unique_info_out)
- *unique_info_out = unique_info;
-
return true; /* Success! */
}
else
@@ -1070,15 +1041,11 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist,
- UniqueRelInfo **unique_info)
+ List *restrictlist)
{
List *clause_list = NIL;
ListCell *lc;
- if (unique_info)
- *unique_info = NULL;
-
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,7 +1083,7 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+ return rel_is_distinct_for(root, innerrel, clause_list);
}
typedef struct
@@ -1589,53 +1556,72 @@ remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
}
/*
- * Test whether the relations are joined on the same unique column.
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
*/
-static bool
-is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
- RelOptInfo *inner, List *restrictlist)
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
{
- UniqueRelInfo *outerinfo = NULL;
- UniqueRelInfo *innerinfo = NULL;
- List *outerValues, *innerValues;
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
- innerrel_is_unique(root, joinrelids, inner->relids,
- outer, JOIN_INNER, restrictlist, true, &outerinfo);
- if (!outerinfo || !outerinfo->index)
- return false;
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
- innerrel_is_unique(root, joinrelids, outer->relids,
- inner, JOIN_INNER, restrictlist, true, &innerinfo);
- if (!innerinfo || !innerinfo->index)
- return false;
+ if (bms_num_members(rinfo->clause_relids) != 2)
+ continue;
- /* We must have the same unique index for both relations. */
- if (outerinfo->index->indexoid != innerinfo->index->indexoid)
- return false;
+ expr = (OpExpr *) rinfo->clause;
- /*
- * We have proven that for both relations, the same unique index guarantees
- * that there is at most one row where columns equal given values. These
- * values must be the same for both relations, or else we won't match the
- * same row on each side of join. A value may be either Const or Var of some
- * other relation. For the purposes of this proof, the Vars of the inner and
- * outer relation are the same, so we replace outer varno with inner and
- * compare the column values using equal().
- */
- innerValues = copyObject(innerinfo->column_values);
- outerValues = copyObject(outerinfo->column_values);
- change_varno((Expr *) innerValues, outer->relid, inner->relid);
- change_varno((Expr *) outerValues, outer->relid, inner->relid);
- if (!equal(outerValues, innerValues))
- {
- list_free_deep(outerValues);
- list_free_deep(innerValues);
- return false;
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ continue;
+
+ leftexpr = linitial(expr->args);
+ rightexpr = lsecond(expr->args);
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ continue;
+
+ /* Easy case, both exprs are Vars */
+ if (IsA(leftexpr, Var))
+ {
+ Var *leftvar = (Var *) leftexpr;
+ Var *rightvar = (Var *) rightexpr;
+
+ Assert(leftvar->varlevelsup == 0);
+ Assert(rightvar->varlevelsup == 0);
+
+ /*
+ * Ensure the caller didn't pass us any join quals that are not
+ * self join quals.
+ */
+ Assert(root->simple_rte_array[leftvar->varno]->relid ==
+ root->simple_rte_array[rightvar->varno]->relid);
+
+ if (leftvar->varattno == rightvar->varattno)
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+
+ }
+
+ /* TODO handle complex exprs by replacing varnos on RHS expr */
}
- list_free_deep(outerValues);
- list_free_deep(innerValues);
- return true;
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
}
/*
@@ -1662,7 +1648,9 @@ remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
for (i = o + 1; i < n; i++)
{
RelOptInfo *inner = root->simple_rel_array[relids[i]];
- List *restrictlist;
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[relids[i]]->relid
@@ -1688,8 +1676,24 @@ remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
/* Is it a unique self join? */
restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
inner);
- if (!is_unique_self_join(root, joinrelids, outer, inner,
- restrictlist))
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals, list_length(otherjoinquals) == 0))
continue;
/*
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 493f4255dd..169e51e792 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1578,8 +1578,7 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators,
- NULL /*index_info*/))
+ sjinfo->semi_operators))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 59ca1fe40e..ffb4cd4bcc 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,7 +272,6 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
- T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0592b64bb7..253e0b7e48 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -522,11 +522,8 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of UniqueRelInfos, each of them recording
- * a set of other rels for which this one has been proven
- * unique. If this is a baserel that is made unique by an
- * index, UniqueRelInfo also stores the information about
- * that index.
+ * unique_for_rels - list of Relid sets, each one being a set of other
+ * rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2474,19 +2471,4 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
-/*
- * UniqueRelInfo records a fact that a relation is unique when being joined
- * to other relation(s) specified by outerrelids. If the uniqueness is
- * guaranteed by a unique index, this index is also saved. The values that
- * constrain index columns, be it Vars of outer relations or Consts, are saved
- * to column_values list.
- */
-typedef struct UniqueRelInfo
-{
- NodeTag tag;
- Relids outerrelids;
- IndexOptInfo *index;
- List *column_values;
-} UniqueRelInfo;
-
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 4260227b9f..36d12bc376 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -70,11 +70,9 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
-
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist,
- UniqueRelInfo **info);
+ List *exprlist, List *oprlist);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 252b8da686..15fa476907 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -103,8 +103,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache,
- UniqueRelInfo **unique_info);
+ JoinType jointype, List *restrictlist, bool force_cache);
extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
List *tlist);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 38e3842278..71d6cf7dc7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,13 +4365,11 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
-------------------------------------------------
+ QUERY PLAN
+--------------------------
Result
One-Time Filter: false
- -> Index Scan using parent_pkey on parent x
- Index Cond: (k = 1)
-(4 rows)
+(2 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4674,27 +4672,19 @@ explain (costs off) select 1 from
reset join_collapse_limit;
reset enable_seqscan;
--- If index conditions are different for each side, we won't select the same
--- row on both sides, so the join can't be removed.
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
create table sl(a int, b int);
create unique index sl_ab on sl(a, b);
explain (costs off)
select * from sl t1, sl t2
where t1.a = t2.a and t1.b = 1 and t2.b = 2;
- QUERY PLAN
-----------------------------------------------
- Nested Loop
- Join Filter: (t1.a = t2.a)
- -> Bitmap Heap Scan on sl t1
- Recheck Cond: (b = 1)
- -> Bitmap Index Scan on sl_ab
- Index Cond: (b = 1)
- -> Materialize
- -> Bitmap Heap Scan on sl t2
- Recheck Cond: (b = 2)
- -> Bitmap Index Scan on sl_ab
- Index Cond: (b = 2)
-(11 rows)
+ QUERY PLAN
+---------------------------------------------------------
+ Index Only Scan using sl_ab on sl t2
+ Index Cond: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
reset enable_hashjoin;
reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 001b41b8b7..262b3988f8 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1651,8 +1651,9 @@ explain (costs off) select 1 from
reset join_collapse_limit;
reset enable_seqscan;
--- If index conditions are different for each side, we won't select the same
--- row on both sides, so the join can't be removed.
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
create table sl(a int, b int);
create unique index sl_ab on sl(a, b);
On 3/25/19 07:07, David Rowley wrote:
You had commented the test with:
-- If index conditions are different for each side, we won't select the same
-- row on both sides, so the join can't be removed.but I don't quite understand why we can't remove the join in this
situation.
My rationale was that we're not exactly removing the join, but replacing
it with a scan. So it is not enough to just have a single row on each
side, it must be the same physical row. In this example, before the
transformation, t1.b is not equal to t2.b, but they become equal
afterwards. This looks somewhat wrong. On the other hand, if the
conditions are different, the resulting condition is going to evaluate
to constant false and we won't get any rows, so maybe it's OK.
This brings me again to the question of what are the conditions for join
removal. If the formulation with indexes is not general enough, what do
we use instead? I guess it could be something like this:
1. Given the (btree equality) join and restriction clauses, both sides
are unique on the same set of columns. That is, if we fix the values of
these columns, both sides have at most one matching row.
a. For each of these columns, we have either
i) a join clause that equates some expression referencing the
outer column to the same expression referencing the same inner column.
ii) a clause for each relation that equates the same expression
referencing the outer and inner column to some other arbitrary
expression, possibly a different one for each side. This expression may
be a Const or some expression that references a Var of some third relation.
2. All the resulting columns can be calculated using either side of the
join. For now, just require that both sides are base relations that
refer to the same physical relation.
Two points are not clear to me here:
1. We don't handle join clauses to third relations, but can they be
treated the same way we treat Consts?
2. Can we simplify the join when we don't have any join clauses and only
have Consts? Or should we have at least one join clause that equates the
same inner and outer column? Why is one join clause enough?
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
I noticed you lost a couple of test cases in v14, so I added them back.
I also added a case where your implementation returns a different number
of rows compared to vanilla:
select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
Please see the attached v15.
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
v15-0001-Remove-self-joins.patchtext/x-patch; name=v15-0001-Remove-self-joins.patchDownload
From 2b1350df99b5e59f9ad7e516806800d09256beea Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Mon, 25 Mar 2019 18:10:52 +0300
Subject: [PATCH v15] Remove self joins.
---
src/backend/optimizer/plan/analyzejoins.c | 961 ++++++++++++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/planmain.h | 7 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 260 ++++++++
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 117 ++++
9 files changed, 1350 insertions(+), 82 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa69..7efcf9f 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,7 +34,7 @@
#include "utils/lsyscache.h"
/* local functions */
-static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
Relids joinrelids);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
@@ -47,18 +48,21 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void split_selfjoin_quals(PlannerInfo *root, List *joinquals,
+ List **selfjoinquals,
+ List **otherjoinquals);
/*
- * remove_useless_joins
- * Check for relations that don't actually need to be joined at all,
- * and remove them from the query.
+ * remove_useless_left_joins
+ * Check for left joined relations that don't actually need to be joined
+ * at all, and remove them from the query.
*
* We are passed the current joinlist and return the updated list. Other
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -74,11 +78,11 @@ restart:
int nremoved;
/* Skip if not removable */
- if (!join_is_removable(root, sjinfo))
+ if (!leftjoin_is_removable(root, sjinfo))
continue;
/*
- * Currently, join_is_removable can only succeed when the sjinfo's
+ * Currently, leftjoin_is_removable can only succeed when the sjinfo's
* righthand is a single baserel. Remove that rel from the query and
* joinlist.
*/
@@ -146,9 +150,9 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
}
/*
- * join_is_removable
- * Check whether we need not perform this special join at all, because
- * it will just duplicate its left input.
+ * leftjoin_is_removable
+ * Check whether we need not perform this left join at all, because it will
+ * just duplicate its left input.
*
* This is true for a left join for which the join condition cannot match
* more than one inner-side row. (There are other possibly interesting
@@ -157,12 +161,11 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
* above the join.
*/
static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
{
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +241,23 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -381,7 +340,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
* joinrelids. However, a PHV used at a partner rel could not have the
* target rel in ph_eval_at, so we check that while deciding whether to
* remove or just update the PHV. There is no corresponding test in
- * join_is_removable because it doesn't need to distinguish those cases.
+ * leftjoin_is_removable because it doesn't need to distinguish those
+ * cases.
*/
for (l = list_head(root->placeholder_list); l != NULL; l = nextl)
{
@@ -1125,3 +1085,874 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(em->em_nullable_relids == NULL);
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+
+ /*
+ * Top-level targetlist of the query. We have to update any references
+ * it has to the relations we remove.
+ */
+ List *targetlist;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If the clause has the form of "X = X", replace it with null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp = (Expr *) get_leftop(rinfo->clause);
+ Expr *rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *test = makeNode(NullTest);
+ test->arg = leftOp;
+ test->nulltesttype = IS_NOT_NULL;
+ test->argisrow = false;
+ test->location = -1;
+ rinfo->clause = (Expr *) test;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2)
+ continue;
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ continue;
+
+ leftexpr = linitial(expr->args);
+ rightexpr = lsecond(expr->args);
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ continue;
+
+ /* Easy case, both exprs are Vars */
+ if (IsA(leftexpr, Var))
+ {
+ Var *leftvar = (Var *) leftexpr;
+ Var *rightvar = (Var *) rightexpr;
+
+ Assert(leftvar->varlevelsup == 0);
+ Assert(rightvar->varlevelsup == 0);
+
+ /*
+ * Ensure the caller didn't pass us any join quals that are not
+ * self join quals.
+ */
+ Assert(root->simple_rte_array[leftvar->varno]->relid ==
+ root->simple_rte_array[rightvar->varno]->relid);
+
+ if (leftvar->varattno == rightvar->varattno)
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+
+ }
+
+ /* TODO handle complex exprs by replacing varnos on RHS expr */
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals, list_length(otherjoinquals) == 0))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) scratch->targetlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist, List *targetlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+ scratch.targetlist = targetlist;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3cedd01..8d6036c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -224,7 +224,7 @@ query_planner(PlannerInfo *root, List *tlist,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root, List *tlist,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist, tlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4130514..fde4118 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -556,7 +552,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -659,7 +655,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -981,7 +977,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -994,9 +990,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1007,8 +1003,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1017,7 +1013,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1043,7 +1039,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1053,7 +1049,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 574bb85..7eb59c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -287,6 +287,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index b093a3c..15fa476 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -95,14 +96,18 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree,
+ List *tlist);
+
/*
* prototypes for plan/setrefs.c
*/
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 88fcd52..6b6f490 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4486,6 +4486,266 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- We need an index on two columns for the next couple of tests.
+create table sl(a int, b int);
+insert into sl values (1, 1), (1, 2), (2, 1);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- This is broken: one of these queries returns a different number of rows
+-- compared to vanilla
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+ a | b | a | b
+---+---+---+---
+ 1 | 1 | 1 | 1
+ 1 | 1 | 1 | 2
+ 2 | 1 | 2 | 1
+(3 rows)
+
+select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+ a | b | a | b
+---+---+---+---
+ 1 | 1 | 1 | 1
+ 1 | 2 | 1 | 1
+ 2 | 1 | 2 | 1
+(3 rows)
+
+-- Check that we can cope with multiple matching unique indexes on each side
+create table sm(a int unique, b int unique, c int unique);
+explain (costs off)
+select * from sm m, sm n where m.a = n.b and m.c = n.c;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sm n
+ Filter: ((c IS NOT NULL) AND (a = b))
+(2 rows)
+
+explain (costs off)
+select * from sm m, sm n where m.a = n.c and m.b = n.b;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sm n
+ Filter: ((b IS NOT NULL) AND (a = c))
+(2 rows)
+
+explain (costs off)
+select * from sm m, sm n where m.c = n.b and m.a = n.a;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sm n
+ Filter: ((a IS NOT NULL) AND (c = b))
+(2 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c247509..e4d3757 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1570,6 +1570,123 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- We need an index on two columns for the next couple of tests.
+create table sl(a int, b int);
+insert into sl values (1, 1), (1, 2), (2, 1);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- This is broken: one of these queries returns a different number of rows
+-- compared to vanilla
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+
+-- Check that we can cope with multiple matching unique indexes on each side
+create table sm(a int unique, b int unique, c int unique);
+explain (costs off)
+select * from sm m, sm n where m.a = n.b and m.c = n.c;
+explain (costs off)
+select * from sm m, sm n where m.a = n.c and m.b = n.b;
+explain (costs off)
+select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
--
2.7.4
On 3/25/19 18:13, Alexander Kuzmenkov wrote:
Please see the attached v15.
I won't be able to continue working on this because I'm changing jobs.
My colleague Arseny Sher is probably going to take over.
Here is a v16 that is a rebased v12, plus renames from v15, plus a
couple of bug fixes (use bms_is_empty instead of a NULL check, and
properly create a NullTest clause when replacing "X = X").
--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v16.patchtext/x-patch; name=0001-Remove-self-joins-v16.patchDownload
From 7ef160823a4051cad54852783c566c51366c2a8a Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Date: Mon, 13 May 2019 19:11:10 +0300
Subject: [PATCH] Remove self joins v16
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 981 +++++++++++++++++++++++++++---
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 4 +-
src/include/optimizer/planmain.h | 9 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 248 +++++++-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 127 ++++
16 files changed, 1436 insertions(+), 99 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 387e4b9..8c6d755 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2271,7 +2271,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2344,6 +2345,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4109,6 +4123,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..88e61d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3750,7 +3759,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index d8ff4bf..89cd236 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a4efa69..71320f4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,6 +30,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
@@ -39,14 +41,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +61,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +165,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +240,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +527,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,10 +602,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -661,8 +627,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +932,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +944,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +980,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1004,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1032,10 +1017,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1081,11 +1071,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1123,5 +1117,854 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(bms_is_empty(em->em_nullable_relids));
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X = X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ RestrictInfo *newRinfo;
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+
+ newRinfo = make_restrictinfo((Expr *) nullTest,
+ rinfo->is_pushed_down, rinfo->outerjoin_delayed,
+ rinfo->pseudoconstant, rinfo->security_level,
+ /* required_relids defaults to clause_relids */
+ NULL,
+ rinfo->outer_relids, rinfo->nullable_relids);
+
+ /*
+ * Mark the new rinfo as derived from the same EC as the original
+ * one, so that we can detect duplicates.
+ */
+ newRinfo->parent_ec = rinfo->parent_ec;
+
+ rinfo = newRinfo;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ foreach (cell, otherrel->lateral_vars)
+ change_varno(lfirst(cell), toRemove->relid, toKeep->relid);
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 2dbf1db..e73e45c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -220,7 +220,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -229,6 +229,11 @@ query_planner(PlannerInfo *root,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 36aee35..dce1f98 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1622,7 +1622,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 272e2eb..3e320f3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -585,7 +581,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -688,7 +684,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1010,7 +1006,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1023,9 +1019,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1036,8 +1032,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1046,7 +1042,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1072,7 +1068,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1082,7 +1078,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index ffb4cd4..59ca1fe 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4b7703d..e1ef204 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -532,8 +532,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2499,4 +2502,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 437250f..0bc5f0a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -291,6 +291,11 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0e85809..56a25bc 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -70,9 +70,11 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
* routines to generate index paths
*/
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
+
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 6d10bf3..e919ab4 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,17 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
+
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 07e631d..fec78c5 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4486,6 +4488,246 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Function Scan on generate_series gs
+(4 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index bf6d5c3..c2033cf 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1570,6 +1570,133 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references
+explain (costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
--
2.7.4
Alexander Lakhin detected the bug in the 'remove self joins' patch:
test:
=====
CREATE TABLE t (a INT UNIQUE);
INSERT INTO t VALUES (1);
SELECT 1 FROM (SELECT x.* FROM t AS x, t AS y WHERE x.a = y.a) AS q,
LATERAL generate_series(1, q.a) gs(i);
Description:
============
FUNCTIONS, TABLEFUNCS and VALUES plan nodes uses direct link to the rte
table. We need to change varno references to relid which will be kept.
Version v.17 of the patch that fix the bug see in attachment.
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-Self-Joins-v17.patchtext/x-patch; name=0001-Remove-Self-Joins-v17.patchDownload
From 78f043e777dec93dca9a41f16f6197e78afa44a1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 27 Jun 2019 11:28:01 +0500
Subject: [PATCH] Remove Self Joins v.17
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 1004 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +-
src/include/optimizer/planmain.h | 7 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 256 +++++-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 127 +++
16 files changed, 1463 insertions(+), 99 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8400dd319e..4ac7e53eb9 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2268,7 +2268,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2340,6 +2341,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4106,6 +4120,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c208e9bfb0..ad66394dbd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3660,6 +3665,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3708,6 +3714,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3750,7 +3759,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56e74..450d2d4048 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 32695db367..8e2bcfaa33 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,6 +30,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
@@ -39,14 +41,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +61,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -162,7 +165,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -238,67 +240,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -568,7 +527,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -643,10 +602,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -661,8 +627,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -966,6 +932,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -974,12 +944,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -999,10 +980,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1019,7 +1004,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1032,10 +1017,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1081,11 +1071,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1123,5 +1117,877 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(ec->ec_members); cell; cell = next)
+ {
+ ListCell *otherCell;
+ EquivalenceMember *em = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(bms_is_empty(em->em_nullable_relids));
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ foreach (otherCell, ec->ec_members)
+ {
+ EquivalenceMember *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(EquivalenceMember, lfirst(otherCell));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (otherCell)
+ ec->ec_members = list_delete_cell(ec->ec_members, cell, prev);
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ ListCell *prev = NULL;
+ ListCell *cell;
+ ListCell *next;
+
+ for (cell = list_head(*sources); cell; cell = next)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+
+ next = lnext(cell);
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ prev = cell;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach (otherCell, *sources)
+ {
+ RestrictInfo *other;
+
+ if (otherCell == cell)
+ continue;
+
+ other = castNode(RestrictInfo, lfirst(otherCell));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell)
+ *sources = list_delete_cell(*sources, cell, prev);
+ else
+ {
+ prev = cell;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X = X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ RestrictInfo *newRinfo;
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+
+ newRinfo = make_restrictinfo((Expr *) nullTest,
+ rinfo->is_pushed_down, rinfo->outerjoin_delayed,
+ rinfo->pseudoconstant, rinfo->security_level,
+ /* required_relids defaults to clause_relids */
+ NULL,
+ rinfo->outer_relids, rinfo->nullable_relids);
+
+ /*
+ * Mark the new rinfo as derived from the same EC as the original
+ * one, so that we can detect duplicates.
+ */
+ newRinfo->parent_ec = rinfo->parent_ec;
+
+ rinfo = newRinfo;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs,
+ node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *prev, *cell, *next;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ prev = NULL;
+ for (cell = list_head(*joinlist); cell; cell = next)
+ {
+ Node *node = lfirst(cell);
+
+ next = lnext(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell, prev);
+ }
+ else
+ prev = cell;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 2dbf1db844..e73e45c000 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -220,7 +220,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d884d2bb00..060707fdd9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1622,7 +1622,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..0f2fd44669 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -585,7 +581,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -688,7 +684,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1010,7 +1006,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1023,9 +1019,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1036,8 +1032,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1046,7 +1042,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1072,7 +1068,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1082,7 +1078,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39105..72b913967a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 441e64eca9..73f750700c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -532,8 +532,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2498,4 +2501,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e70d6a3f18..a659846fa3 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -291,6 +291,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..9814f00a54 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd50d..785828abdf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..a57905f3ab 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 07e631d45e..dc40d33102 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4485,6 +4487,254 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..b1e248313a 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index bf6d5c3ae4..9bca529265 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1569,6 +1569,133 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.17.1
On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
Version v.17 of the patch that fix the bug see in attachment.
While moving this to the September CF, I noticed that it needs to be
updated for the recent pg_list.h API changes.
--
Thomas Munro
https://enterprisedb.com
On 02/08/2019 04:54, Thomas Munro wrote:
On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:Version v.17 of the patch that fix the bug see in attachment.
While moving this to the September CF, I noticed that it needs to be
updated for the recent pg_list.h API changes.
The patch was updated:
1. Changes caused by pg_list.h API changes.
2. Fix the problem of joint clause_relids and required_relids changes [1]/messages/by-id/5c21029d-81a2-c999-6744-6a898fcc9a19@postgrespro.ru.
3. Add eclass mentions of removed relation into the kept relation (field
eclass_indexes was introduced by commit 3373c71553).
[1]: /messages/by-id/5c21029d-81a2-c999-6744-6a898fcc9a19@postgrespro.ru
/messages/by-id/5c21029d-81a2-c999-6744-6a898fcc9a19@postgrespro.ru
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v18.patchtext/x-patch; name=0001-Remove-self-joins-v18.patchDownload
From 72ac38de7fb55fe741677ea75b280eecb443e978 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 2 Aug 2019 11:01:47 +0500
Subject: [PATCH] Remove self joins v18
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 1021 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +-
src/include/optimizer/planmain.h | 7 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 256 +++++-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 127 +++
16 files changed, 1480 insertions(+), 99 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 86c31a48c9..9d511d1d1b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2270,7 +2270,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2342,6 +2343,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4108,6 +4122,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f339fdfde..b57be54df6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3568,7 +3568,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3589,12 +3590,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3645,6 +3650,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3693,6 +3699,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3735,7 +3744,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56e74..450d2d4048 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d19ff4138e..9c55f5f710 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,6 +30,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
@@ -39,14 +41,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
/*
@@ -58,7 +61,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +520,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +595,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +620,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +925,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +937,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
+
+ if (unique_info_out)
+ *unique_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +973,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +997,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1010,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1064,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1110,894 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(bms_is_empty(em->em_nullable_relids));
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_relid(&rinfo->clause_relids, toRemove, toKeep);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, toRemove, toKeep);
+ change_relid(&rinfo->left_relids, toRemove, toKeep);
+ change_relid(&rinfo->right_relids, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ bool is_req_equal = (rinfo->required_relids == rinfo->clause_relids) ?
+ true : false;
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_varno(rinfo->clause, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->clause_relids, toRemove->relid, toKeep->relid);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->left_relids, toRemove->relid, toKeep->relid);
+ change_relid(&rinfo->right_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X = X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ RestrictInfo *newRinfo;
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+
+ newRinfo = make_restrictinfo((Expr *) nullTest,
+ rinfo->is_pushed_down, rinfo->outerjoin_delayed,
+ rinfo->pseudoconstant, rinfo->security_level,
+ /* required_relids defaults to clause_relids */
+ NULL,
+ rinfo->outer_relids, rinfo->nullable_relids);
+
+ /*
+ * Mark the new rinfo as derived from the same EC as the original
+ * one, so that we can detect duplicates.
+ */
+ newRinfo->parent_ec = rinfo->parent_ec;
+
+ rinfo = newRinfo;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *cell;
+ int counter;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ for (counter = 0; counter < list_length(*joinlist); )
+ {
+ Node *node;
+
+ cell = list_nth_cell(*joinlist, counter);
+ node = (Node *) lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell);
+ }
+ else
+ counter++;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index df3f8c2544..e0818d8a5e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,7 +226,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -234,6 +234,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0ac73984d2..f9ed2f2ccd 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1598,7 +1598,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 37d228ce5d..e5c15e0eab 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -586,7 +582,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -690,7 +686,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1013,7 +1009,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1026,9 +1022,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1039,8 +1035,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1049,7 +1045,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1075,7 +1071,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1085,7 +1081,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39105..72b913967a 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e3c579ee44..44e5da4da8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -536,8 +536,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2504,4 +2507,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 182ffeef4b..860b6c77f9 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -289,6 +289,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..9814f00a54 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd50d..785828abdf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..a57905f3ab 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b407bc485..c06a8fb04f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4490,11 +4490,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4610,6 +4612,254 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..b1e248313a 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 145accca86..4890e80fef 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1626,6 +1626,133 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.17.1
On 05.08.2019 14:24, Andrey Lepikhov wrote:
On 02/08/2019 04:54, Thomas Munro wrote:
On Thu, Jun 27, 2019 at 6:42 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:Version v.17 of the patch that fix the bug see in attachment.
While moving this to the September CF, I noticed that it needs to be
updated for the recent pg_list.h API changes.The patch was updated:
1. Changes caused by pg_list.h API changes.
2. Fix the problem of joint clause_relids and required_relids changes
[1].
3. Add eclass mentions of removed relation into the kept relation
(field eclass_indexes was introduced by commit 3373c71553).[1]
/messages/by-id/5c21029d-81a2-c999-6744-6a898fcc9a19@postgrespro.ru
One more bug is fixed in this patch: OR clauses were not correctly
updated in case of self join removal.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v19.patchtext/x-patch; name=0001-Remove-self-joins-v19.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e6ce8e2..1af8d75 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2274,7 +2274,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2347,6 +2348,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4112,6 +4126,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f339fd..b57be54 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3568,7 +3568,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3589,12 +3590,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3645,6 +3650,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3693,6 +3699,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3735,7 +3744,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56..450d2d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d19ff41..190b21b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,6 +30,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
@@ -39,15 +41,16 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
-
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
* remove_useless_joins
@@ -58,7 +61,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +164,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +239,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +520,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +595,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +620,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +925,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +937,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ if (unique_info_out)
+ *unique_info_out = NULL;
+
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +973,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +997,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1010,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1064,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1110,898 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var) && ((Var *) node)->varno == context->oldRelid)
+ {
+ ((Var *) node)->varno = context->newRelid;
+ ((Var *) node)->varnoold = context->newRelid;
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(*relids, oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ Assert(bms_is_empty(em->em_nullable_relids));
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ List *toAppend;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+ toAppend = list_concat(joinclauses, toRemove->baserestrictinfo);
+ toAppend = list_concat(toAppend, toRemove->joininfo);
+
+ foreach(cell, toAppend)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ bool is_join_clause = !bms_is_subset(rinfo->required_relids, joinrelids);
+ List **target = is_join_clause ? &toKeep->joininfo : &toKeep->baserestrictinfo;
+ ListCell *otherCell;
+
+ /*
+ * We can't have an EC-derived clause that joins to some third
+ * relation
+ */
+ Assert(!(is_join_clause && rinfo->parent_ec != NULL));
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because it
+ * may be already present in the joininfo or baserestrictinfo. Still,
+ * we have to switch it to point to the remaining relation. This is
+ * important for join clauses that reference both relations, because
+ * they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X = X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies)
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ RestrictInfo *newRinfo;
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+
+ newRinfo = make_restrictinfo((Expr *) nullTest,
+ rinfo->is_pushed_down, rinfo->outerjoin_delayed,
+ rinfo->pseudoconstant, rinfo->security_level,
+ /* required_relids defaults to clause_relids */
+ NULL,
+ rinfo->outer_relids, rinfo->nullable_relids);
+
+ /*
+ * Mark the new rinfo as derived from the same EC as the original
+ * one, so that we can detect duplicates.
+ */
+ newRinfo->parent_ec = rinfo->parent_ec;
+
+ rinfo = newRinfo;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, restrictlist, true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, restrictlist, true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *cell;
+ int counter;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ for (counter = 0; counter < list_length(*joinlist); )
+ {
+ Node *node;
+
+ cell = list_nth_cell(*joinlist, counter);
+ node = (Node *) lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell);
+ }
+ else
+ counter++;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->min_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->min_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->min_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index df3f8c2..e0818d8 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,7 +226,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -235,6 +235,11 @@ query_planner(PlannerInfo *root,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 34acb73..1466ec0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1598,7 +1598,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 37d228c..e5c15e0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -586,7 +582,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -690,7 +686,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1013,7 +1009,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1026,9 +1022,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1039,8 +1035,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1049,7 +1045,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1075,7 +1071,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1085,7 +1081,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 4e2fb39..72b9139 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e3c579e..44e5da4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -536,8 +536,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2504,4 +2507,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 182ffee..860b6c7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -289,6 +289,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137..9814f00 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd..785828a 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b407bc..ba2f993 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4490,11 +4490,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4611,6 +4613,270 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 145accc..45ca447 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1627,6 +1627,140 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
Some more bug fixes in this patch.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-20.patchtext/x-patch; name=0001-Remove-self-joins-20.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b0dcd02..dc8cb9c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2274,7 +2274,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2347,6 +2348,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4122,6 +4136,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 37b257c..3d0d03b 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3731,7 +3740,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56..450d2d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d19ff41..39aef73 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,10 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -39,15 +42,16 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
-
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
* remove_useless_joins
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +165,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +240,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +521,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +596,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +621,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +926,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +938,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
+
+ if (unique_info_out)
+ *unique_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +974,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +998,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1011,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1065,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1111,1006 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnoold = context->newRelid;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+// Assert(bms_is_empty(em->em_nullable_relids));
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+ }
+
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+ }
+
+ /* Tranfer removed relations clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Tranfer removed relations clauses to kept relation */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (int i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (int i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, list_concat(list_concat_unique(list_copy(outer->joininfo), restrictlist),
+ outer->baserestrictinfo), true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, list_concat(list_concat_unique(list_copy(inner->joininfo), restrictlist),
+ inner->baserestrictinfo), true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ if (restrictlist != NULL && IsA(((RestrictInfo*)linitial(restrictlist))->clause, Const))
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ ListCell *cell;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (cell)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *cell;
+ int counter;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ for (counter = 0; counter < list_length(*joinlist); )
+ {
+ Node *node;
+
+ cell = list_nth_cell(*joinlist, counter);
+ node = (Node *) lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell);
+ }
+ else
+ counter++;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->syn_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->syn_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index f0c1b52..8413241 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -218,7 +218,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -227,6 +227,11 @@ query_planner(PlannerInfo *root,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 34acb73..1466ec0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1598,7 +1598,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8541538..88eacd2 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -589,7 +585,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -693,7 +689,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1013,7 +1009,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1026,9 +1022,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1039,8 +1035,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1049,7 +1045,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1075,7 +1071,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1085,7 +1081,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 6b63d6f..bfc0036 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1427,7 +1427,6 @@ AllocSetCheck(MemoryContext context)
chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bce2d59..78b4f1e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 23a06d7..f9bb078 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -534,8 +534,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2502,4 +2505,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a12af54..1c2d064 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -288,6 +288,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137..9814f00 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd..785828a 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b58d560..3080539 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4648,6 +4648,270 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 8443c24..6034d93 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2193,16 +2193,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------
Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 57481d0..8d07615 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1646,6 +1646,140 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
Slightly refactored version of the patch with more comments.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v20.patchtext/x-patch; name=0001-Remove-self-joins-v20.patchDownload
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b0dcd02..dc8cb9c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2274,7 +2274,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2347,6 +2348,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
}
static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
+static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
/*
@@ -4122,6 +4136,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 37b257c..3d0d03b 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3731,7 +3740,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56..450d2d4 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d19ff41..ad60daf 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,10 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -39,15 +42,16 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
-
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
* remove_useless_joins
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +165,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +240,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +521,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +596,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +621,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +926,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +938,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
+
+ if (unique_info_out)
+ *unique_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +974,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +998,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1011,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1065,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1111,1009 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnoold = context->newRelid;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to baserestrict list of kept relation. */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Tranfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quauls with null-tests. */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, list_concat(list_concat_unique(list_copy(outer->joininfo), restrictlist),
+ outer->baserestrictinfo), true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, list_concat(list_concat_unique(list_copy(inner->joininfo), restrictlist),
+ inner->baserestrictinfo), true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ if (restrictlist != NULL && IsA(((RestrictInfo*)linitial(restrictlist))->clause, Const))
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ ListCell *cell;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (cell)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *cell;
+ int counter;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ for (counter = 0; counter < list_length(*joinlist); )
+ {
+ Node *node;
+
+ cell = list_nth_cell(*joinlist, counter);
+ node = (Node *) lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell);
+ }
+ else
+ counter++;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->syn_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->syn_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index f0c1b52..8413241 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -218,7 +218,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -227,6 +227,11 @@ query_planner(PlannerInfo *root,
reduce_unique_semijoins(root);
/*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
+ /*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
* placeholder is evaluable at a base rel.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 34acb73..1466ec0 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1598,7 +1598,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8541538..88eacd2 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -589,7 +585,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -693,7 +689,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1013,7 +1009,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1026,9 +1022,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1039,8 +1035,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1049,7 +1045,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1075,7 +1071,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1085,7 +1081,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 6b63d6f..bfc0036 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1427,7 +1427,6 @@ AllocSetCheck(MemoryContext context)
chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bce2d59..78b4f1e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 23a06d7..f9bb078 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -534,8 +534,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2502,4 +2505,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a12af54..1c2d064 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -288,6 +288,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137..9814f00 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd..785828a 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85..a57905f 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b58d560..3080539 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4648,6 +4648,270 @@ select * from
(0 rows)
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
+--
-- Test hints given on incorrect column references are useful
--
select t1.uunique1 from
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 8443c24..6034d93 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2193,16 +2193,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------
Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65d..b1e2483 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 57481d0..8d07615 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1646,6 +1646,140 @@ select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
+--
-- Test hints given on incorrect column references are useful
--
v.21 in attechment fix small bug:
Now we move all non-mergejoinable clauses from joininfo to base restrict
info because of the relation will not be joined.
On 30/09/2019 13:29, Konstantin Knizhnik wrote:
Slightly refactored version of the patch with more comments.
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v21.patchtext/x-patch; name=0001-Remove-self-joins-v21.patchDownload
From 4540345416f1d5ac71395773621d022d15b529f1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 5 Nov 2019 17:57:23 +0500
Subject: [PATCH] Remove self joins v21
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 1155 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/mmgr/aset.c | 1 -
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +-
src/include/optimizer/planmain.h | 7 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 275 ++++
src/test/regress/expected/updatable_views.out | 15 +-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 139 ++
18 files changed, 1653 insertions(+), 107 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b0dcd02ff6..dc8cb9cec0 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2274,7 +2274,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2346,6 +2347,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4122,6 +4136,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 37b257cd0e..3d0d03b887 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3731,7 +3740,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index dc28b56e74..450d2d4048 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d19ff4138e..76e92a7798 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,10 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -39,15 +42,16 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
-
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
* remove_useless_joins
@@ -58,7 +62,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +165,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +240,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +521,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +596,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +621,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +926,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +938,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
+
+ if (unique_info_out)
+ *unique_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +974,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +998,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1011,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1065,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1111,1025 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnoold = context->newRelid;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+
+ /*
+ * The toKeep relation will not be joined. All non-mergejoinable
+ * clauses stands as filters. We need to transfer these to the
+ * baserestrictinfo.
+ */
+ if (!rinfo->mergeopfamilies && rinfo->can_join &&
+ IsA(rinfo->clause, OpExpr))
+ {
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+
+ /* Update security level info */
+ toKeep->baserestrict_min_security =
+ Min(toKeep->baserestrict_min_security, rinfo->security_level);
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to baserestrict list of kept relation. */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Transfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quauls with null-tests. */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, list_concat(list_concat_unique(list_copy(outer->joininfo), restrictlist),
+ outer->baserestrictinfo), true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, list_concat(list_concat_unique(list_copy(inner->joininfo), restrictlist),
+ inner->baserestrictinfo), true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ if (restrictlist != NULL && IsA(((RestrictInfo*)linitial(restrictlist))->clause, Const))
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ ListCell *cell;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (cell)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *cell;
+ int counter;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ for (counter = 0; counter < list_length(*joinlist); )
+ {
+ Node *node;
+
+ cell = list_nth_cell(*joinlist, counter);
+ node = (Node *) lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell);
+ }
+ else
+ counter++;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->syn_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->syn_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index f0c1b52a2e..8413241407 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -218,7 +218,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 34acb732ee..1466ec0dba 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1598,7 +1598,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 85415381fb..88eacd2899 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -589,7 +585,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -693,7 +689,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1013,7 +1009,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1026,9 +1022,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1039,8 +1035,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1049,7 +1045,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1075,7 +1071,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1085,7 +1081,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index f729d9b6de..e43801cbe3 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1479,7 +1479,6 @@ AllocSetCheck(MemoryContext context)
chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bce2d59b0d..78b4f1e317 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 23a06d718e..f9bb078bf2 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -534,8 +534,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2502,4 +2505,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index a12af54971..1c2d0642c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -288,6 +288,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7345137d1d..9814f00a54 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index e7aaddd50d..785828abdf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..a57905f3ab 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b58d560163..000c8a0eaf 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4647,6 +4647,281 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 8443c24f18..6034d93663 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2193,16 +2193,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------
Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..b1e248313a 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 57481d0411..448001cf7b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1645,6 +1645,145 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.17.1
Rebased version v.22.
- Added enable_self_join_removal GUC (true is default)
- The joinquals of the relation that is being removed, redistributed in
accordance with the remove_rel_from_query () machinery.
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-self-joins-v.22.patchtext/x-patch; charset=UTF-8; name=0001-Remove-self-joins-v.22.patchDownload
From 2d194aab7f8e43805a61901de34b28fcc4e136b4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 27 Jan 2020 14:51:20 +0500
Subject: [PATCH] Remove self joins v.22
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 1174 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 11 +
src/backend/utils/mmgr/aset.c | 1 -
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +-
src/include/optimizer/planmain.h | 7 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 275 ++++
src/test/regress/expected/updatable_views.out | 15 +-
src/test/regress/sql/equivclass.sql | 17 +
src/test/regress/sql/join.sql | 139 ++
19 files changed, 1683 insertions(+), 107 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index d76fae44b8..7b2b2641cd 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2278,7 +2278,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2350,6 +2351,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4131,6 +4145,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..0c6e7eee74 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3731,7 +3740,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 7d2756e234..88db5d01c8 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..49ae8510cb 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,12 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+bool enable_self_join_removal;
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -39,15 +44,16 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
-
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
* remove_useless_joins
@@ -58,7 +64,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +167,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +242,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +523,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +598,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +623,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +928,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +940,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
+
+ if (unique_info_out)
+ *unique_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +976,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +1000,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1013,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1067,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1113,1042 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+ List *joininfos = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to baserestrict list of kept relation. */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Transfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quals with null-tests. */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ toKeep->baserestrict_min_security = Min(toKeep->baserestrict_min_security, rinfo->security_level);
+ }
+
+ /*
+ * Now the state of the joininfo list of the toKeep relation is changed:
+ * some of clauses can be removed, some need to move into baserestrictinfo.
+ * To do this use the same technique as the remove_rel_from_query() routine.
+ */
+ joininfos = list_copy(toKeep->joininfo);
+ foreach(cell, joininfos)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+
+ if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
+ {
+ /* Recheck that qual doesn't actually reference the target rel */
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * The required_relids probably aren't shared with anything else,
+ * but let's copy them just to be sure.
+ */
+ rinfo->required_relids = bms_copy(rinfo->required_relids);
+ rinfo->required_relids = bms_del_member(rinfo->required_relids,
+ toRemove->relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, list_concat(list_concat_unique(list_copy(outer->joininfo), restrictlist),
+ outer->baserestrictinfo), true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, list_concat(list_concat_unique(list_copy(inner->joininfo), restrictlist),
+ inner->baserestrictinfo), true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ if (restrictlist != NULL && IsA(((RestrictInfo*)linitial(restrictlist))->clause, Const))
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ ListCell *cell;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * This optimization applies to inner joins only, so skip any
+ * relations that form a special join.
+ */
+ if (bms_is_member(relids[i], scratch->special_join_rels[relids[o]]))
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (cell)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static void
+remove_self_joins_one_level(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *cell;
+ int counter;
+ Relids relidsToRemove = NULL;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (cell, *joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(cell);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ /* Delete the removed relations from joinlist. */
+ for (counter = 0; counter < list_length(*joinlist); )
+ {
+ Node *node;
+
+ cell = list_nth_cell(*joinlist, counter);
+ node = (Node *) lfirst(cell);
+
+ if (IsA(node, RangeTblRef)
+ && bms_is_member(((RangeTblRef *) node)->rtindex, relidsToRemove))
+ {
+ *joinlist = list_delete_cell(*joinlist, cell);
+ }
+ else
+ counter++;
+ }
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static void
+remove_self_joins_recurse(UsjScratch *scratch, List **joinlist)
+{
+ ListCell *lc;
+ foreach (lc, *joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ remove_self_joins_recurse(scratch, (List **) &lfirst(lc));
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ remove_self_joins_one_level(scratch, joinlist);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+void
+remove_useless_self_joins(PlannerInfo *root, List **joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+
+ if (!enable_self_join_removal)
+ return;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.special_join_rels = palloc0(root->simple_rel_array_size * sizeof(Relids));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Find out which relations have special joins to which. */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+ int bit = -1;
+ while ((bit = bms_next_member(info->syn_lefthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_righthand);
+ }
+
+ bit = -1;
+ while ((bit = bms_next_member(info->syn_righthand, bit)) >= 0)
+ {
+ RelOptInfo *rel = find_base_rel(root, bit);
+ scratch.special_join_rels[rel->relid] =
+ bms_add_members(scratch.special_join_rels[rel->relid],
+ info->syn_lefthand);
+ }
+ }
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ remove_self_joins_recurse(&scratch, joinlist);
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..416478f981 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -218,7 +218,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ remove_useless_self_joins(root, &joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e6d08aede5..0ccd3f99af 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1597,7 +1597,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 374f93890b..e1452b3e6a 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -589,7 +585,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -693,7 +689,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1026,7 +1022,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1039,9 +1035,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1052,8 +1048,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1062,7 +1058,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1088,7 +1084,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1098,7 +1094,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cacbe904db..20aa037f18 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -125,6 +125,7 @@ extern char *temp_tablespaces;
extern bool ignore_checksum_failure;
extern bool ignore_invalid_pages;
extern bool synchronize_seqscans;
+extern bool enable_self_join_removal;
#ifdef TRACE_SYNCSCAN
extern bool trace_syncscan;
@@ -1065,6 +1066,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c0623f106d..48685db2ba 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)
chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..b0bcc266cb 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..417dbdb2b9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -536,8 +536,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2509,4 +2512,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..0b503e2cf4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -288,6 +288,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..f3c6204e0a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index eab486a621..cedbe3f05d 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,15 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **unique_info);
+extern void remove_useless_self_joins(PlannerInfo *root, List **jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index c448d85dec..a57905f3ab 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -439,3 +439,35 @@ explain (costs off)
Filter: ((unique1 = unique1) OR (unique2 = unique2))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..e51961d264 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4673,6 +4673,281 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 5de53f2782..97682cd2cc 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2214,16 +2214,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------
Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 85aa65de39..b1e248313a 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -262,3 +262,20 @@ explain (costs off)
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
+
+
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..1c97635acd 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1655,6 +1655,145 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.17.1
On 1/27/20 11:10 PM, Andrey Lepikhov wrote:
Rebased version v.22.
- Added enable_self_join_removal GUC (true is default)
- The joinquals of the relation that is being removed, redistributed in
accordance with the remove_rel_from_query () machinery.
This patch no longer applies cleanly on
src/test/regress/sql/equivclass.sql:
http://cfbot.cputube.org/patch_27_1712.log
The CF entry has been updated to Waiting on Author.
Regards,
--
-David
david@pgmasters.net
On 4/1/20 8:34 PM, David Steele wrote:
This patch no longer applies cleanly on
src/test/regress/sql/equivclass.sql:
http://cfbot.cputube.org/patch_27_1712.logThe CF entry has been updated to Waiting on Author.
v.23 in attachment:
1. The patch applies cleanly.
2. Add checks: two potentially self joined relations may belong to
different rules of order restriction in join_info_list.
3. Add test for item 2.
The CF entry has been updated to Needs review.
--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
Attachments:
0001-Remove-Self-Joins-v.23.patchtext/x-patch; charset=UTF-8; name=0001-Remove-Self-Joins-v.23.patchDownload
From e74d3c2549737305419b3c29301d29c1e191d6ae Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 3 Apr 2020 09:31:58 +0500
Subject: [PATCH] Remove Self Joins v. 23
---
src/backend/nodes/outfuncs.c | 19 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 1178 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 7 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 11 +
src/backend/utils/mmgr/aset.c | 1 -
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 22 +-
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +-
src/include/optimizer/planmain.h | 10 +-
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 330 +++++
src/test/regress/expected/updatable_views.out | 15 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 169 +++
19 files changed, 1773 insertions(+), 108 deletions(-)
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index eb168ffd6d..a1a9ae1ac1 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2281,7 +2281,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_OID_FIELD(userid);
WRITE_BOOL_FIELD(useridiscurrent);
/* we don't try to print fdwroutine or fdw_private */
- /* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+ WRITE_NODE_FIELD(unique_for_rels);
+ /* can't print non_unique_for_rels; BMSes aren't Nodes */
WRITE_NODE_FIELD(baserestrictinfo);
WRITE_UINT_FIELD(baserestrict_min_security);
WRITE_NODE_FIELD(joininfo);
@@ -2353,6 +2354,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
WRITE_BITMAPSET_FIELD(keys);
}
+static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+ WRITE_NODE_TYPE("UNIQUERELINFO");
+
+ WRITE_BITMAPSET_FIELD(outerrelids);
+ if (node->index)
+ {
+ WRITE_OID_FIELD(index->indexoid);
+ WRITE_NODE_FIELD(column_values);
+ }
+}
+
static void
_outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
{
@@ -4135,6 +4149,9 @@ outNode(StringInfo str, const void *obj)
case T_StatisticExtInfo:
_outStatisticExtInfo(str, obj);
break;
+ case T_UniqueRelInfo:
+ _outUniqueRelInfo(str, obj);
+ break;
case T_ExtensibleNode:
_outExtensibleNode(str, obj);
break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 2a50272da6..0c6e7eee74 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3564,7 +3564,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
* relation_has_unique_index_for
* Determine whether the relation provably has at most one row satisfying
* a set of equality conditions, because the conditions constrain all
- * columns of some unique index.
+ * columns of some unique index. If index_info is not null, it is set to
+ * point to a new UniqueRelInfo containing the index and conditions.
*
* The conditions can be represented in either or both of two ways:
* 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3585,12 +3586,16 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **unique_info)
{
ListCell *ic;
Assert(list_length(exprlist) == list_length(oprlist));
+ if (unique_info)
+ *unique_info = NULL;
+
/* Short-circuit if no indexes... */
if (rel->indexlist == NIL)
return false;
@@ -3641,6 +3646,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *column_values = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3689,6 +3695,9 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+ column_values = lappend(column_values, rinfo->outer_is_left
+ ? get_leftop(rinfo->clause)
+ : get_rightop(rinfo->clause));
break;
}
}
@@ -3731,7 +3740,22 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (unique_info)
+ {
+ /* This may be called in GEQO memory context. */
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+ *unique_info = makeNode(UniqueRelInfo);
+ (*unique_info)->index = ind;
+ (*unique_info)->column_values = list_copy(column_values);
+ MemoryContextSwitchTo(oldContext);
+ }
+ if (column_values)
+ list_free(column_values);
return true;
+ }
+ if (column_values)
+ list_free(column_values);
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..b0d1a249e7 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -176,7 +176,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -185,7 +186,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL /*index_info*/);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..3cd297868f 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,12 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+bool enable_self_join_removal;
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -39,15 +44,16 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, UniqueRelInfo **unique_info);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
-
+ List *restrictlist,
+ UniqueRelInfo **unique_info);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
* remove_useless_joins
@@ -58,7 +64,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
* data structures that have to be updated are accessible via "root".
*/
List *
-remove_useless_joins(PlannerInfo *root, List *joinlist)
+remove_useless_left_joins(PlannerInfo *root, List *joinlist)
{
ListCell *lc;
@@ -161,7 +167,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
int innerrelid;
RelOptInfo *innerrel;
Relids joinrelids;
- List *clause_list = NIL;
ListCell *l;
int attroff;
@@ -237,67 +242,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
}
/*
- * Search for mergejoinable clauses that constrain the inner rel against
- * either the outer rel or a pseudoconstant. If an operator is
- * mergejoinable then it behaves like equality for some btree opclass, so
- * it's what we want. The mergejoinability test also eliminates clauses
- * containing volatile functions, which we couldn't depend on.
+ * Check for pushed-down clauses referencing the inner rel. If there is
+ * such a clause then join removal has to be disallowed. We have to
+ * check this despite the previous attr_needed checks because of the
+ * possibility of pushed-down clauses referencing the rel.
*/
foreach(l, innerrel->joininfo)
{
RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(l);
-
- /*
- * If it's not a join clause for this outer join, we can't use it.
- * Note that if the clause is pushed-down, then it is logically from
- * above the outer join, even if it references no other rels (it might
- * be from WHERE, for example).
- */
- if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
- {
- /*
- * If such a clause actually references the inner rel then join
- * removal has to be disallowed. We have to check this despite
- * the previous attr_needed checks because of the possibility of
- * pushed-down clauses referencing the rel.
- */
- if (bms_is_member(innerrelid, restrictinfo->clause_relids))
+ if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)
+ && bms_is_member(innerrel->relid, restrictinfo->clause_relids))
return false;
- continue; /* else, ignore; not useful here */
- }
-
- /* Ignore if it's not a mergejoinable clause */
- if (!restrictinfo->can_join ||
- restrictinfo->mergeopfamilies == NIL)
- continue; /* not mergejoinable */
-
- /*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
- */
- if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
- innerrel->relids))
- continue; /* no good for these input relations */
-
- /* OK, add to list */
- clause_list = lappend(clause_list, restrictinfo);
}
- /*
- * Now that we have the relevant equality join clauses, try to prove the
- * innerrel distinct.
- */
- if (rel_is_distinct_for(root, innerrel, clause_list))
- return true;
-
- /*
- * Some day it would be nice to check for other methods of establishing
- * distinctness.
- */
- return false;
+ return is_innerrel_unique_for(root, joinrelids, sjinfo->min_lefthand,
+ innerrel, sjinfo->jointype, innerrel->joininfo,
+ NULL /*unique_index*/);
}
-
/*
* Remove the target relid from the planner's data structures, having
* determined that there is no need to include it in the query.
@@ -561,7 +523,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL /*index_info*/))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -636,10 +598,17 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * If unique_index is not null, it is set to point to the index that guarantees
+ * uniqueness for a base relation.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ UniqueRelInfo **unique_info)
{
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* We could skip a couple of tests here if we assume all callers checked
* rel_supports_distinctness first, but it doesn't seem worth taking any
@@ -654,8 +623,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
- return true;
+ return relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ unique_info);
}
else if (rel->rtekind == RTE_SUBQUERY)
{
@@ -959,6 +928,10 @@ distinct_col_search(int colno, List *colnos, List *opids)
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
* sequence.
+ *
+ * If index_info_out is not null, it is set to point to a new UniqueRelInfo
+ * allocated in root memory context, that describes the index that guarantees
+ * uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -967,12 +940,23 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ UniqueRelInfo **unique_info_out)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *unique_info;
+
+ if (unique_info_out)
+ *unique_info_out = NULL;
- /* Certainly can't prove uniqueness when there are no joinclauses */
+ /*
+ * It is possible to prove uniqueness even in the absence of joinclauses,
+ * just from baserestrictinfos alone. However, in these cases the inner
+ * relation returns one row at most, so join removal won't give much
+ * benefit. It seems better to save some planning time by ignoring these
+ * cases.
+ */
if (restrictlist == NIL)
return false;
@@ -992,10 +976,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ unique_info = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(unique_info->outerrelids, outerrelids))
+ {
+ if (unique_info_out)
+ *unique_info_out = unique_info;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +1000,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &unique_info))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1013,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
- innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ if (!unique_info)
+ unique_info = makeNode(UniqueRelInfo);
+ unique_info->outerrelids = bms_copy(outerrelids);
+ innerrel->unique_for_rels = lappend(innerrel->unique_for_rels, unique_info);
MemoryContextSwitchTo(old_context);
+ if (unique_info_out)
+ *unique_info_out = unique_info;
+
return true; /* Success! */
}
else
@@ -1074,11 +1067,15 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ UniqueRelInfo **unique_info)
{
List *clause_list = NIL;
ListCell *lc;
+ if (unique_info)
+ *unique_info = NULL;
+
/*
* Search for mergejoinable clauses that constrain the inner rel against
* the outer rel. If an operator is mergejoinable then it behaves like
@@ -1116,5 +1113,1046 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, unique_info);
+}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /*
+ * Array of Relids, one for each relation, indexed by relation id. Each
+ * element is a set of relation ids with which this relation has a special
+ * join.
+ */
+ Relids *special_join_rels;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+ List *joininfos = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to baserestrict list of kept relation. */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Transfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quals with null-tests. */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst(otherCell);
+ if (other == rinfo
+ || (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ toKeep->baserestrict_min_security = Min(toKeep->baserestrict_min_security, rinfo->security_level);
+ }
+
+ /*
+ * Now the state of the joininfo list of the toKeep relation is changed:
+ * some of clauses can be removed, some need to move into baserestrictinfo.
+ * To do this use the same technique as the remove_rel_from_query() routine.
+ */
+ joininfos = list_copy(toKeep->joininfo);
+ foreach(cell, joininfos)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+
+ if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
+ {
+ /* Recheck that qual doesn't actually reference the target rel */
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * The required_relids probably aren't shared with anything else,
+ * but let's copy them just to be sure.
+ */
+ rinfo->required_relids = bms_copy(rinfo->required_relids);
+ rinfo->required_relids = bms_del_member(rinfo->required_relids,
+ toRemove->relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * Test whether the relations are joined on the same unique column.
+ */
+static bool
+is_unique_self_join(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer,
+ RelOptInfo *inner, List *restrictlist)
+{
+ UniqueRelInfo *outerinfo = NULL;
+ UniqueRelInfo *innerinfo = NULL;
+ List *outerValues, *innerValues;
+
+ innerrel_is_unique(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, list_concat(list_concat_unique(list_copy(outer->joininfo), restrictlist),
+ outer->baserestrictinfo), true, &outerinfo);
+ if (!outerinfo || !outerinfo->index)
+ return false;
+
+ innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, list_concat(list_concat_unique(list_copy(inner->joininfo), restrictlist),
+ inner->baserestrictinfo), true, &innerinfo);
+ if (!innerinfo || !innerinfo->index)
+ return false;
+
+ /* We must have the same unique index for both relations. */
+ if (outerinfo->index->indexoid != innerinfo->index->indexoid)
+ return false;
+
+ if (restrictlist != NULL && IsA(((RestrictInfo*)linitial(restrictlist))->clause, Const))
+ return false;
+
+ /*
+ * We have proven that for both relations, the same unique index guarantees
+ * that there is at most one row where columns equal given values. These
+ * values must be the same for both relations, or else we won't match the
+ * same row on each side of join. A value may be either Const or Var of some
+ * other relation. For the purposes of this proof, the Vars of the inner and
+ * outer relation are the same, so we replace outer varno with inner and
+ * compare the column values using equal().
+ */
+ innerValues = copyObject(innerinfo->column_values);
+ outerValues = copyObject(outerinfo->column_values);
+ change_varno((Expr *) innerValues, outer->relid, inner->relid);
+ change_varno((Expr *) outerValues, outer->relid, inner->relid);
+ if (!equal(outerValues, innerValues))
+ {
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+ return false;
+ }
+ list_free_deep(outerValues);
+ list_free_deep(innerValues);
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ ListCell *lc;
+ bool jinfo_check = true;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(relids[i], info->syn_lefthand) &&
+ !bms_is_member(relids[o], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[i], info->syn_righthand) &&
+ !bms_is_member(relids[o], info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_lefthand) &&
+ !bms_is_member(relids[i], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_righthand) &&
+ !bms_is_member(relids[i], info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+ if (!is_unique_self_join(root, joinrelids, outer, inner,
+ restrictlist))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static Relids
+remove_self_joins_one_level(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ /*
+ * This optimization won't work for tables that have inheritance
+ * children.
+ */
+ if (rte->inh)
+ continue;
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return relidsToRemove;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ return relidsToRemove;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ relidsToRemove = remove_self_joins_recurse(
+ scratch,
+ (List *) lfirst(lc),
+ relidsToRemove);
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ return remove_self_joins_one_level(scratch, joinlist, relidsToRemove);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL condtions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+ Relids relidsToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ relidsToRemove = remove_self_joins_recurse(&scratch,
+ joinlist,
+ relidsToRemove);
+ while ((relid = bms_next_member(relidsToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ pfree(scratch.relids);
+ pfree(scratch.row_marks);
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..d5788df979 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -218,7 +218,7 @@ query_planner(PlannerInfo *root,
* jointree preprocessing, but the necessary information isn't available
* until we've built baserel data structures and classified qual clauses.
*/
- joinlist = remove_useless_joins(root, joinlist);
+ joinlist = remove_useless_left_joins(root, joinlist);
/*
* Also, reduce any semijoins with unique inner rels to plain inner joins.
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 8ba8122ee2..e22a5bcd3a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1597,7 +1597,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL /*index_info*/))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 374f93890b..e1452b3e6a 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -39,14 +39,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -589,7 +585,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -693,7 +689,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1026,7 +1022,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1039,9 +1035,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1052,8 +1048,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1062,7 +1058,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1088,7 +1084,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1098,7 +1094,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 79bc7ac8ca..ea5cc8cce8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -126,6 +126,7 @@ extern char *temp_tablespaces;
extern bool ignore_checksum_failure;
extern bool ignore_invalid_pages;
extern bool synchronize_seqscans;
+extern bool enable_self_join_removal;
#ifdef TRACE_SYNCSCAN
extern bool trace_syncscan;
@@ -1121,6 +1122,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c0623f106d..48685db2ba 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)
chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..78766b5c3d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -272,6 +272,7 @@ typedef enum NodeTag
T_RollupData,
T_GroupingSetData,
T_StatisticExtInfo,
+ T_UniqueRelInfo,
/*
* TAGS FOR MEMORY NODES (memnodes.h)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5334a73b53..848fc2c222 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -536,8 +536,11 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
- * rels for which this one has been proven unique
+ * unique_for_rels - list of UniqueRelInfos, each of them recording
+ * a set of other rels for which this one has been proven
+ * unique. If this is a baserel that is made unique by an
+ * index, UniqueRelInfo also stores the information about
+ * that index.
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
* this one unique
@@ -2512,4 +2515,19 @@ typedef struct JoinCostWorkspace
double inner_rows_total;
} JoinCostWorkspace;
+/*
+ * UniqueRelInfo records a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids. If the uniqueness is
+ * guaranteed by a unique index, this index is also saved. The values that
+ * constrain index columns, be it Vars of outer relations or Consts, are saved
+ * to column_values list.
+ */
+typedef struct UniqueRelInfo
+{
+ NodeTag tag;
+ Relids outerrelids;
+ IndexOptInfo *index;
+ List *column_values;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index e450fe112a..0b503e2cf4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -288,6 +288,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..f3c6204e0a 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,7 +72,8 @@ extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ UniqueRelInfo **info);
extern bool indexcol_is_bool_constant_for_query(IndexOptInfo *index,
int indexcol);
extern bool match_index_to_operand(Node *operand, int indexcol,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 4781201001..05b1a0f4d6 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -16,6 +16,7 @@
#include "nodes/pathnodes.h"
#include "nodes/plannodes.h"
+#include "optimizer/paths.h"
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
@@ -96,13 +97,16 @@ extern void match_foreign_keys_to_quals(PlannerInfo *root);
/*
* prototypes for plan/analyzejoins.c
*/
-extern List *remove_useless_joins(PlannerInfo *root, List *joinlist);
+extern List *remove_useless_left_joins(PlannerInfo *root, List *joinlist);
extern void reduce_unique_semijoins(PlannerInfo *root);
extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
- Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist, bool force_cache,
+ UniqueRelInfo **index_info);
+
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..55ebd3fd2c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4673,6 +4673,336 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 5de53f2782..97682cd2cc 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2214,16 +2214,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------
Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..360b1fba1b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1655,6 +1655,175 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+--
+---- It's not clear yet what to do in some of the cases we discussed
+---- on the list, so they are disabled for now.
+--
+---- We need an index on two columns for the next couple of tests.
+--create table sl(a int, b int);
+--insert into sl values (1, 1), (1, 2), (2, 1);
+--create unique index on sl(a, b);
+--vacuum analyze sl;
+--
+---- Both sides are unique, but base quals are different
+--explain (costs off)
+--select * from sl t1, sl t2
+--where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.17.1
On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
v.23 in attachment:
Hi,
This is only a partial review as I see the patch still does not
incorporate the self join detection method I mentioned in March 2019
and the one the patch only partially works.
Here's the partial review which contains the details:
1. Change to aset.c not relevant to this patch.
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1485,7 +1485,6 @@ AllocSetCheck(MemoryContext context)
chsize = chunk->size; /* aligned chunk size */
dsize = chunk->requested_size; /* real data */
-
/*
* Check chunk size
*/
2. Why GUC_NOT_IN_SAMPLE?
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE
3. I'd expect this to remove the join.
create table t1 (
a int not null unique,
b int not null unique,
c int not null unique,
d int not null,
e int not null
);
explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.b=t2.c;
QUERY PLAN
----------------------------------------------------------------------
Hash Join (cost=52.50..88.42 rows=1 width=40)
Hash Cond: ((t1.a = t2.a) AND (t1.b = t2.c))
-> Seq Scan on t1 (cost=0.00..27.00 rows=1700 width=20)
-> Hash (cost=27.00..27.00 rows=1700 width=20)
-> Seq Scan on t1 t2 (cost=0.00..27.00 rows=1700 width=20)
(5 rows)
This one seems to work.
This one *does* seem to work.
explain select * from t1 inner join t1 t2 on t1.a=t2.a and t1.d=t2.e;
You should likely implement the method I wrote in the final paragraph in [1]/messages/by-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg+o1MGZk4mHBn_Rg@mail.gmail.com
The basic idea here is you first process the join quals:
t1.a=t2.a and t1.b=t2.c
and keep just the ones that use the same varattno on either side
(t1.a=t2.a). Perhaps it's not worth thinking about Exprs for now, or
if you think it is you can normalise the varnos to 1 and equal()
Then just pass the remaining quals to relation_has_unique_index_for().
If it returns true then there's no need to perform the opposite check
for the other relation. It would just return the same thing.
This will allow you to get rid of using the following as proofs:
/* We must have the same unique index for both relations. */
if (outerinfo->index->indexoid != innerinfo->index->indexoid)
return false;
... as I've shown above. This only works in some cases and that's not
really good enough.
Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.
4. Should be ok for partitioned tables though:
/*
* This optimization won't work for tables that have inheritance
* children.
*/
if (rte->inh)
continue;
David
[1]: /messages/by-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg+o1MGZk4mHBn_Rg@mail.gmail.com
Thank you for this partial review, I included your changes:
On 9/23/20 9:23 AM, David Rowley wrote:
On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.
Thanks for the review and sorry for the late reply.
I fixed small mistakes, mentioned in your letter.
Also I rewrote this patch at your suggestion [1].
Because of many changes, this patch can be viewed as a sketch.
To change self-join detection algorithm I used your delta patch from
[2]: /messages/by-id/CAKJS1f8cJOCGyoxi7a_LG7eu+WKF9+HTff3wp1KKS5gcUg2Qfg@mail.gmail.com
handling for demonstration. But, it is not very useful with current
infrastructure, i think.
Also I implemented one additional way for self-join detection algorithm:
if the join target list isn't contained vars from inner relation, then
we can detect self-join with only quals like a1.x=a2.y if check
innerrel_is_unique is true.
Analysis of the target list is contained in the new routine -
tlist_contains_rel_exprs - rewritten version of the build_joinrel_tlist
routine.
Also changes of the join_is_removable() routine is removed from the
patch. I couldn't understand why it is needed here.
Note, this patch causes change of one join.sql regression test output.
It is not a bug, but maybe fixed.
Applied over commit 4a071afbd0.
[1]
/messages/by-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg+o1MGZk4mHBn_Rg@mail.gmail.com
[2]: /messages/by-id/CAKJS1f8cJOCGyoxi7a_LG7eu+WKF9+HTff3wp1KKS5gcUg2Qfg@mail.gmail.com
/messages/by-id/CAKJS1f8cJOCGyoxi7a_LG7eu+WKF9+HTff3wp1KKS5gcUg2Qfg@mail.gmail.com
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v24-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v24-0001-Remove-self-joins.patchDownload
From b3d69fc66f9d9ecff5e43842ca71b5aeb8f7e92b Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Fri, 30 Oct 2020 10:24:40 +0500
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if can be proven that such
join can be replaced with a scan. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row, if:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars then the inner row
is (physically) same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Collect all another join quals.
3. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. Proved, that this join is self-join and can be replaced by
a scan.
4. If the list from (1) is NIL, check that the vars from the inner
and outer relations falls into the join target list.
5. If vars from the inner relation can't fall into the target list,
check innerrel_is_unique() for the qual list from (2). If it returns
true then outer row matches only one inner row, not necessary same.
But this is no longer a problem here. Proved, that this is removable
self-join.
Some regression tests change due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 1185 +++++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 339 +++++-
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 166 +++
11 files changed, 1769 insertions(+), 19 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..1221bf4599 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,12 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+bool enable_self_join_removal;
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -47,6 +52,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
@@ -1118,3 +1124,1182 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+ List *joininfos = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to
+ * baserestrict list of kept relation.
+ * joinclauses contains all restrictions, include general expressions like
+ * f(a'.x1..a'.xN, a''.x1..a''.xN) op g(a'.x1..a'.xN, a''.x1..a''.xN)
+ * that couldn't fall into the EC.
+ */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+ ListCell *otherCell;
+ List **target = &toKeep->baserestrictinfo;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+
+ foreach(otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * XXX: In the case of clauses:
+ * toKeep.x = toRemove.y and toKeep.y = toRemove.x
+ * we pass into baserestrictinfo both as:
+ * toKeep.x = toKeep.y and toKeep.y = toKeep.x
+ * that is may be not fine, but correct.
+ */
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * During replacing relids in joininfo some restrictions could change
+ * semantic from join to baserestrict info, for example:
+ * a1.x = a2.y => a1.x = a1.y
+ * If we already have restriction in the baserestrictinfo list:
+ * a1.x = a1.y, here we will remove duplicates.
+ */
+ target = &toKeep->joininfo;
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Transfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quals with
+ * null-tests.
+ */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ toKeep->baserestrict_min_security = Min(toKeep->baserestrict_min_security, rinfo->security_level);
+ }
+
+ /*
+ * Now the state of the joininfo list of the toKeep relation is changed:
+ * some of clauses can be removed, some need to move into baserestrictinfo.
+ * To do this use the same technique as the remove_rel_from_query() routine.
+ */
+ joininfos = list_copy(toKeep->joininfo);
+ foreach(cell, joininfos)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+
+ if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
+ {
+ /* Recheck that qual doesn't actually reference the target rel */
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * The required_relids probably aren't shared with anything else,
+ * but let's copy them just to be sure.
+ */
+ rinfo->required_relids = bms_copy(rinfo->required_relids);
+ rinfo->required_relids = bms_del_member(rinfo->required_relids,
+ toRemove->relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+static bool
+tlist_contains_rel_exprs(PlannerInfo *root, Relids relids, RelOptInfo *rel)
+{
+ ListCell *vars;
+
+ foreach(vars, rel->reltarget->exprs)
+ {
+ Var *var = (Var *) lfirst(vars);
+ RelOptInfo *baserel;
+ int ndx;
+
+ /*
+ * Ignore PlaceHolderVars in the input tlists; we'll make our own
+ * decisions about whether to copy them.
+ */
+ if (IsA(var, PlaceHolderVar))
+ return true;
+
+ /*
+ * Otherwise, anything in a baserel or joinrel targetlist ought to be
+ * a Var. (More general cases can only appear in appendrel child
+ * rels, which will never be seen here.)
+ */
+ if (!IsA(var, Var))
+ elog(ERROR, "unexpected node type in rel targetlist: %d",
+ (int) nodeTag(var));
+
+ /* Get the Var's original base rel */
+ baserel = find_base_rel(root, var->varno);
+
+ /* Is it still needed above this joinrel? */
+ ndx = var->varattno - baserel->min_attr;
+ if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(relids[i], info->syn_lefthand) &&
+ !bms_is_member(relids[o], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[i], info->syn_righthand) &&
+ !bms_is_member(relids[o], info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_lefthand) &&
+ !bms_is_member(relids[i], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_righthand) &&
+ !bms_is_member(relids[i], info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * Have a chance to remove join if target list contains vars from
+ * the only one relation.
+ */
+ if (list_length(otherjoinquals) == 0)
+ {
+ /* Can't determine uniqueness without any quals. */
+ continue;
+
+ }
+ else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
+ {
+ if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, otherjoinquals,
+ false))
+ continue;
+ }
+ else
+ /*
+ * The target list contains vars from both inner and outer
+ * relations.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static Relids
+remove_self_joins_one_level(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return relidsToRemove;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ return relidsToRemove;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ relidsToRemove = remove_self_joins_recurse(
+ scratch,
+ (List *) lfirst(lc),
+ relidsToRemove);
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ return remove_self_joins_one_level(scratch, joinlist, relidsToRemove);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+ Relids relidsToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ relidsToRemove = remove_self_joins_recurse(&scratch,
+ joinlist,
+ relidsToRemove);
+ while ((relid = bms_next_member(relidsToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ pfree(scratch.relids);
+ pfree(scratch.row_marks);
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3dd3269fe3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a203e6f1ff..db5ab283fc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -600,7 +596,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -706,7 +702,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1041,7 +1037,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1054,9 +1050,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1067,8 +1063,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1077,7 +1073,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1103,7 +1099,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1113,7 +1109,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..413a7d104a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1108,6 +1108,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..d7bcfcd7ec 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -295,6 +295,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 81c4a7e560..16cfd3f7a4 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -106,6 +107,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6c9a5e26dd..827af6bc85 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4553,11 +4553,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -4673,6 +4675,337 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..9847097700 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(18 rows)
+(19 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index dd60d6a1f3..a6bfa37380 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1655,6 +1655,172 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
On 31.10.2020 12:26, Andrey V. Lepikhov wrote:
Thank you for this partial review, I included your changes:
On 9/23/20 9:23 AM, David Rowley wrote:
On Fri, 3 Apr 2020 at 17:43, Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
Doing thing the way I describe will allow you to get rid of all the
UniqueRelInfo stuff.Thanks for the review and sorry for the late reply.
I fixed small mistakes, mentioned in your letter.
Also I rewrote this patch at your suggestion [1].
Because of many changes, this patch can be viewed as a sketch.To change self-join detection algorithm I used your delta patch from
[2]. I added in the split_selfjoin_quals routine complex expressions
handling for demonstration. But, it is not very useful with current
infrastructure, i think.Also I implemented one additional way for self-join detection
algorithm: if the join target list isn't contained vars from inner
relation, then we can detect self-join with only quals like a1.x=a2.y
if check innerrel_is_unique is true.
Analysis of the target list is contained in the new routine -
tlist_contains_rel_exprs - rewritten version of the
build_joinrel_tlist routine.Also changes of the join_is_removable() routine is removed from the
patch. I couldn't understand why it is needed here.Note, this patch causes change of one join.sql regression test output.
It is not a bug, but maybe fixed.Applied over commit 4a071afbd0.
[1]
/messages/by-id/CAKJS1f8p-KiEujr12k-oa52JNWWaQUjEjNg+o1MGZk4mHBn_Rg@mail.gmail.com
[2]
/messages/by-id/CAKJS1f8cJOCGyoxi7a_LG7eu+WKF9+HTff3wp1KKS5gcUg2Qfg@mail.gmail.com
Status update for a commitfest entry.
This entry was "Waiting on author" during this CF. As I see, the latest
message contains new version of the patch. Does it need more work? Are
you going to continue working on it?
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
+ /* + * Process restrictlist to seperate out the self join quals from + * the other quals. e.g x = x goes to selfjoinquals and a = b to + * otherjoinquals. + */ + split_selfjoin_quals(root, restrictlist, &selfjoinquals, + &otherjoinquals); + + if (list_length(selfjoinquals) == 0) + { + /* + * Have a chance to remove join if target list contains vars from + * the only one relation. + */ + if (list_length(otherjoinquals) == 0) + { + /* Can't determine uniqueness without any quals. */ + continue; + + } + else if (!tlist_contains_rel_exprs(root, joinrelids, inner)) + { + if (!innerrel_is_unique(root, joinrelids, outer->relids, + inner, JOIN_INNER, otherjoinquals, + false)) + continue; + } + else + /* + * The target list contains vars from both inner and outer + * relations. + */ + continue; + } + + /* + * Determine if the inner table can duplicate outer rows. We must + * bypass the unique rel cache here since we're possibly using a + * subset of join quals. We can use 'force_cache' = true when all + * join quals are selfjoin quals. Otherwise we could end up + * putting false negatives in the cache. + */ + else if (!innerrel_is_unique(root, joinrelids, outer->relids, + inner, JOIN_INNER, selfjoinquals, + list_length(otherjoinquals) == 0)) + continue;
I don't understand the logic here. If 'selfjoinquals' is empty, it means
that there is no join qual between the two relations, right? How can we
ever remove the join in that case? And how does the target list affect
that? Can you give an example query of that?
--- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -4553,11 +4553,13 @@ explain (costs off) select p.* from (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k where p.k = 1 and p.k = 2; - QUERY PLAN --------------------------- + QUERY PLAN +------------------------------------------------ Result One-Time Filter: false -(2 rows) + -> Index Scan using parent_pkey on parent x + Index Cond: (k = 1) +(4 rows)-- bug 5255: this is not optimizable by join removal
begin;
That doesn't seem like an improvement...
My general impression of this patch is that it's a lot of code to deal
with a rare special case. At the beginning of this thread there was
discussion on the overhead that this might add to planning queries that
don't benefit, but adding a lot of code isn't nice either, even if the
performance is acceptable. That's not necessarily a show-stopper, but it
does make me much less excited about this. I'm not sure what to suggest
to do about that, except a really vague "Can you make is simpler?"
- Heikki
Thank you for the review,
On 27.11.2020 21:49, Heikki Linnakangas wrote:
On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
+ /* + * Process restrictlist to seperate out the self join quals from + * the other quals. e.g x = x goes to selfjoinquals and a = b to + * otherjoinquals. + */ + split_selfjoin_quals(root, restrictlist, &selfjoinquals, + &otherjoinquals); + + if (list_length(selfjoinquals) == 0) + { + /* + * Have a chance to remove join if target list contains vars from + * the only one relation. + */I don't understand the logic here. If 'selfjoinquals' is empty, it means
that there is no join qual between the two relations, right? How can we
ever remove the join in that case? And how does the target list affect
that? Can you give an example query of that?
Maybe it is a problem of variable naming. Following the idea of David
Rowley, we split quals into two subsets: {x==x} and another, for example
{x=y}.
First set is an trivial case of self-join: if we have unique index on
the attribute 'x', then this join is self-join.
Second set is give us a chance: if right side is unique for right side
of the qual and no vars from right side end up in the target list of the
join, then this is a self-join case. Example:
CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too
--- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -4553,11 +4553,13 @@ explain (costs off) select p.* from (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k where p.k = 1 and p.k = 2; - QUERY PLAN --------------------------- + QUERY PLAN +------------------------------------------------ Result One-Time Filter: false -(2 rows) + -> Index Scan using parent_pkey on parent x + Index Cond: (k = 1) +(4 rows)-- bug 5255: this is not optimizable by join removal
begin;That doesn't seem like an improvement...
I investigated this case. It is a planner feature: it simplifies dummy
joins and dummy scans to different plans. Maybe it can cause some
discussion, but this example so rare and doesn't make a problem.
My general impression of this patch is that it's a lot of code to deal
with a rare special case. At the beginning of this thread there was
discussion on the overhead that this might add to planning queries that
don't benefit, but adding a lot of code isn't nice either, even if the
performance is acceptable. That's not necessarily a show-stopper, but it
does make me much less excited about this. I'm not sure what to suggest
to do about that, except a really vague "Can you make is simpler?"
Currently planner reduces useless outer joins and unique semijoins.
Reduce self join feature continues the development of the planner in the
same direction. For example, it is needed for ORM software.
Most of the code dedicated to removing unnecessary relation and
replacing of one oid with another. We are trying to use
remove_rel_from_query() machinery. Perhaps this will allow us to make
the code shorter.
--
regards,
Andrey Lepikhov
Postgres Professional
On 28/11/2020 19:21, Andrey Lepikhov wrote:
On 27.11.2020 21:49, Heikki Linnakangas wrote:
On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
+ /* + * Process restrictlist to seperate out the self join quals from + * the other quals. e.g x = x goes to selfjoinquals and a = b to + * otherjoinquals. + */ + split_selfjoin_quals(root, restrictlist, &selfjoinquals, + &otherjoinquals); + + if (list_length(selfjoinquals) == 0) + { + /* + * Have a chance to remove join if target list contains vars from + * the only one relation. + */I don't understand the logic here. If 'selfjoinquals' is empty, it means
that there is no join qual between the two relations, right? How can we
ever remove the join in that case? And how does the target list affect
that? Can you give an example query of that?Maybe it is a problem of variable naming. Following the idea of David
Rowley, we split quals into two subsets: {x==x} and another, for example
{x=y}.
First set is an trivial case of self-join: if we have unique index on
the attribute 'x', then this join is self-join.
Second set is give us a chance: if right side is unique for right side
of the qual and no vars from right side end up in the target list of the
join, then this is a self-join case. Example:CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join too
The latter join is not "useless". The patch is returning incorrect
result for that query:
postgres=# insert into a values (1, 2);
INSERT 0 1
postgres=# insert into a values (2, 1);
INSERT 0 1
postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
x | y
---+---
(0 rows)postgres=# set enable_self_join_removal=off;
SET
postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT RESULT
x | y
---+---
1 | 2
2 | 1
(2 rows)
- Heikki
On 11/29/20 10:10 PM, Heikki Linnakangas wrote:
On 28/11/2020 19:21, Andrey Lepikhov wrote:
On 27.11.2020 21:49, Heikki Linnakangas wrote:
CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join tooThe latter join is not "useless". The patch is returning incorrect
result for that query:postgres=# insert into a values (1, 2);
INSERT 0 1
postgres=# insert into a values (2, 1);
INSERT 0 1
postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
x | y ---+---
(0 rows)postgres=# set enable_self_join_removal=off;
SET
postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT
RESULT
x | y ---+---
1 | 2
2 | 1
(2 rows)
Thanks, it is my fault. I tried to extend this patch with foreign key
references and made a mistake.
Currently I rollback this new option (see patch in attachment), but will
be working for a while to simplify this patch.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v25-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v25-0001-Remove-self-joins.patchDownload
From 7be9cc9790b51b6afaabe2fbcf293f1b649265ea Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Fri, 30 Oct 2020 10:24:40 +0500
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if can be proven that such
join can be replaced with a scan. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row, if:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars then the inner row
is (physically) same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Collect all another join quals.
3. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. Proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 1186 +++++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 331 ++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 174 +++
11 files changed, 1773 insertions(+), 16 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 806629fff2..0e92245116 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,12 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+bool enable_self_join_removal;
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -47,6 +52,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
@@ -1118,3 +1124,1183 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+ List *joininfos = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to
+ * baserestrict list of kept relation.
+ * joinclauses contains all restrictions, include general expressions like
+ * f(a'.x1..a'.xN, a''.x1..a''.xN) op g(a'.x1..a'.xN, a''.x1..a''.xN)
+ * that couldn't fall into the EC.
+ */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+ ListCell *otherCell;
+ List **target = &toKeep->baserestrictinfo;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+
+ foreach(otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * XXX: In the case of clauses:
+ * toKeep.x = toRemove.y and toKeep.y = toRemove.x
+ * we pass into baserestrictinfo both as:
+ * toKeep.x = toKeep.y and toKeep.y = toKeep.x
+ * that is may be not fine, but correct.
+ */
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * During replacing relids in joininfo some restrictions could change
+ * semantic from join to baserestrict info, for example:
+ * a1.x = a2.y => a1.x = a1.y
+ * If we already have restriction in the baserestrictinfo list:
+ * a1.x = a1.y, here we will remove duplicates.
+ */
+ target = &toKeep->joininfo;
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Transfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quals with
+ * null-tests.
+ */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ toKeep->baserestrict_min_security = Min(toKeep->baserestrict_min_security, rinfo->security_level);
+ }
+
+ /*
+ * Now the state of the joininfo list of the toKeep relation is changed:
+ * some of clauses can be removed, some need to move into baserestrictinfo.
+ * To do this use the same technique as the remove_rel_from_query() routine.
+ */
+ joininfos = list_copy(toKeep->joininfo);
+ foreach(cell, joininfos)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+
+ if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
+ {
+ /* Recheck that qual doesn't actually reference the target rel */
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * The required_relids probably aren't shared with anything else,
+ * but let's copy them just to be sure.
+ */
+ rinfo->required_relids = bms_copy(rinfo->required_relids);
+ rinfo->required_relids = bms_del_member(rinfo->required_relids,
+ toRemove->relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+static bool
+tlist_contains_rel_exprs(PlannerInfo *root, Relids relids, RelOptInfo *rel)
+{
+ ListCell *vars;
+
+ foreach(vars, rel->reltarget->exprs)
+ {
+ Var *var = (Var *) lfirst(vars);
+ RelOptInfo *baserel;
+ int ndx;
+
+ /*
+ * Ignore PlaceHolderVars in the input tlists; we'll make our own
+ * decisions about whether to copy them.
+ */
+ if (IsA(var, PlaceHolderVar))
+ return true;
+
+ /*
+ * Otherwise, anything in a baserel or joinrel targetlist ought to be
+ * a Var. (More general cases can only appear in appendrel child
+ * rels, which will never be seen here.)
+ */
+ if (!IsA(var, Var))
+ elog(ERROR, "unexpected node type in rel targetlist: %d",
+ (int) nodeTag(var));
+
+ /* Get the Var's original base rel */
+ baserel = find_base_rel(root, var->varno);
+
+ /* Is it still needed above this joinrel? */
+ ndx = var->varattno - baserel->min_attr;
+ if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(relids[i], info->syn_lefthand) &&
+ !bms_is_member(relids[o], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[i], info->syn_righthand) &&
+ !bms_is_member(relids[o], info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_lefthand) &&
+ !bms_is_member(relids[i], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_righthand) &&
+ !bms_is_member(relids[i], info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * Have a chance to remove join if target list contains vars from
+ * the only one relation.
+ */
+ if (list_length(otherjoinquals) == 0)
+ {
+ /* Can't determine uniqueness without any quals. */
+ continue;
+
+ }
+ else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
+ {
+ /*
+ * TODO:
+ * In this case, we only have a chance in the case of a
+ * foreign key reference.
+ */
+ continue;
+ }
+ else
+ /*
+ * The target list contains vars from both inner and outer
+ * relations.
+ */
+ continue;
+ }
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static Relids
+remove_self_joins_one_level(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return relidsToRemove;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ return relidsToRemove;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ relidsToRemove = remove_self_joins_recurse(
+ scratch,
+ (List *) lfirst(lc),
+ relidsToRemove);
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ return remove_self_joins_one_level(scratch, joinlist, relidsToRemove);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+ Relids relidsToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ relidsToRemove = remove_self_joins_recurse(&scratch,
+ joinlist,
+ relidsToRemove);
+ while ((relid = bms_next_member(relidsToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ pfree(scratch.relids);
+ pfree(scratch.row_marks);
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..3dd3269fe3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 76245c1ff3..a7dc7bed1b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -599,7 +595,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -704,7 +700,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1038,7 +1034,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1051,9 +1047,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1064,8 +1060,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1074,7 +1070,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1100,7 +1096,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1110,7 +1106,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 245a3472bc..71b43ed17a 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1108,6 +1108,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..d7bcfcd7ec 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -295,6 +295,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 81c4a7e560..16cfd3f7a4 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -106,6 +107,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 60b621b651..2683f93c62 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4674,6 +4674,337 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..9847097700 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(18 rows)
+(19 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d687216618..79c7a4b6ed 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1656,6 +1656,180 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index sl_idx on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+DROP INDEX sl_idx;
+CREATE UNIQUE INDEX ON sl(a);
+CREATE UNIQUE INDEX ON sl(b);
+explain (COSTS OFF)
+SELECT a1.* FROM sl a1, sl a2 WHERE a1.a = a2.a; -- self-join
+explain (COSTS OFF)
+SELECT a1.* FROM sl a1, sl a2 WHERE a1.a = a2.b; -- not self-join
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
Hi Andrey,
On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 11/29/20 10:10 PM, Heikki Linnakangas wrote:
On 28/11/2020 19:21, Andrey Lepikhov wrote:
On 27.11.2020 21:49, Heikki Linnakangas wrote:
CREATE TABLE a(x int, y int);
CREATE UNIQUE INDEX ON a(x);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x; -- self-join
CREATE UNIQUE INDEX ON a(y);
SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- self-join tooThe latter join is not "useless". The patch is returning incorrect
result for that query:postgres=# insert into a values (1, 2);
INSERT 0 1
postgres=# insert into a values (2, 1);
INSERT 0 1
postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
x | y ---+---
(0 rows)postgres=# set enable_self_join_removal=off;
SET
postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT
RESULT
x | y ---+---
1 | 2
2 | 1
(2 rows)Thanks, it is my fault. I tried to extend this patch with foreign key
references and made a mistake.
Currently I rollback this new option (see patch in attachment), but will
be working for a while to simplify this patch.
Are you working to simplify the patch? This patch has been "Waiting on
Author" for 1 month and doesn't seem to pass cfbot tests. Please
update the patch.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On 1/7/21 7:08 PM, Masahiko Sawada wrote:
On Mon, Nov 30, 2020 at 2:51 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:Thanks, it is my fault. I tried to extend this patch with foreign key
references and made a mistake.
Currently I rollback this new option (see patch in attachment), but will
be working for a while to simplify this patch.Are you working to simplify the patch? This patch has been "Waiting on
Author" for 1 month and doesn't seem to pass cfbot tests. Please
update the patch.
Yes, I'm working to improve this feature.
In attachment - fixed and rebased on ce6a71fa53.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v25_1-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v25_1-0001-Remove-self-joins.patchDownload
From 3caeb297320af690be71b367329d86c49564e231 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 11 Jan 2021 09:01:11 +0500
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if can be proven that such
join can be replaced with a scan. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row, if:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars then the inner row
is (physically) same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Collect all another join quals.
3. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. Proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 1186 +++++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 331 ++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 166 +++
11 files changed, 1765 insertions(+), 16 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 90460a69bd..d631e95f89 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -29,8 +30,12 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+bool enable_self_join_removal;
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
@@ -47,6 +52,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
/*
@@ -1118,3 +1124,1183 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static void
+change_relid(Relids *relids, Index oldId, Index newId)
+{
+ if (bms_is_member(oldId, *relids))
+ *relids = bms_add_member(bms_del_member(bms_copy(*relids), oldId), newId);
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ change_relid(&rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ change_relid(&rinfo->required_relids, from, to);
+ change_relid(&rinfo->left_relids, from, to);
+ change_relid(&rinfo->right_relids, from, to);
+ change_relid(&rinfo->outer_relids, from, to);
+ change_relid(&rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_relid(&em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Scratch space for the unique self join removal code.
+ */
+typedef struct
+{
+ PlannerInfo *root;
+
+ /* Temporary array for relation ids. */
+ Index *relids;
+
+ /* Array of row marks indexed by relid. */
+ PlanRowMark **row_marks;
+
+ /* Bitmapset for join relids that is used to avoid reallocation. */
+ Relids joinrelids;
+} UsjScratch;
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(UsjScratch *scratch, Relids joinrelids, List *joinclauses,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ PlannerInfo *root = scratch->root;
+ ListCell *cell;
+ int i;
+ List *joininfos = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+
+ /* Replace removed relation in joininfo */
+ foreach(cell, toKeep->joininfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+ }
+
+ /* Replace removed relation in joinclauses.
+ * It is similar with code above but also adds substituted null-tests to
+ * baserestrict list of kept relation.
+ * joinclauses contains all restrictions, include general expressions like
+ * f(a'.x1..a'.xN, a''.x1..a''.xN) op g(a'.x1..a'.xN, a''.x1..a''.xN)
+ * that couldn't fall into the EC.
+ */
+ foreach(cell, joinclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+ ListCell *otherCell;
+ List **target = &toKeep->baserestrictinfo;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+
+ foreach(otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * XXX: In the case of clauses:
+ * toKeep.x = toRemove.y and toKeep.y = toRemove.x
+ * we pass into baserestrictinfo both as:
+ * toKeep.x = toKeep.y and toKeep.y = toKeep.x
+ * that is may be not fine, but correct.
+ */
+ toKeep->baserestrictinfo = lappend(toKeep->baserestrictinfo, rinfo);
+ }
+ }
+
+ /* Tranfer removed relation baserestrictinfo clauses to kept relation */
+ foreach(cell, toRemove->baserestrictinfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->baserestrictinfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ /* We can't have an EC-derived clause that joins to some third relation */
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * During replacing relids in joininfo some restrictions could change
+ * semantic from join to baserestrict info, for example:
+ * a1.x = a2.y => a1.x = a1.y
+ * If we already have restriction in the baserestrictinfo list:
+ * a1.x = a1.y, here we will remove duplicates.
+ */
+ target = &toKeep->joininfo;
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ *target = lappend(*target, rinfo);
+ }
+
+ /* Transfer removed relation join-clauses to kept relation.
+ * It is similar with code above but also substituted redundant quals with
+ * null-tests.
+ */
+ foreach(cell, toRemove->joininfo)
+ {
+ ListCell *otherCell;
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ List **target = &toKeep->joininfo;
+
+ /*
+ * Replace the references to the removed relation with references to
+ * the remaining one. We won't necessarily add this clause, because
+ * it may be already present in the joininfo or baserestrictinfo.
+ * Still, we have to switch it to point to the remaining relation.
+ * This is important for join clauses that reference both relations,
+ * because they are included in both joininfos.
+ */
+ change_rinfo(rinfo, toRemove->relid, toKeep->relid);
+
+ /*
+ * Don't add the clause if it is already present in the list, or
+ * derived from the same equivalence class, or is the same as another
+ * clause.
+ */
+ foreach (otherCell, *target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ continue;
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp, *rightOp;
+
+ Assert(IsA(rinfo->clause, OpExpr));
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr*)nullTest;
+ }
+ }
+
+ *target = lappend(*target, rinfo);
+ toKeep->baserestrict_min_security = Min(toKeep->baserestrict_min_security, rinfo->security_level);
+ }
+
+ /*
+ * Now the state of the joininfo list of the toKeep relation is changed:
+ * some of clauses can be removed, some need to move into baserestrictinfo.
+ * To do this use the same technique as the remove_rel_from_query() routine.
+ */
+ joininfos = list_copy(toKeep->joininfo);
+ foreach(cell, joininfos)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+
+ if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
+ {
+ /* Recheck that qual doesn't actually reference the target rel */
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * The required_relids probably aren't shared with anything else,
+ * but let's copy them just to be sure.
+ */
+ rinfo->required_relids = bms_copy(rinfo->required_relids);
+ rinfo->required_relids = bms_del_member(rinfo->required_relids,
+ toRemove->relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (scratch->row_marks[toRemove->relid])
+ {
+ PlanRowMark **markToRemove = &scratch->row_marks[toRemove->relid];
+ PlanRowMark **markToKeep = &scratch->row_marks[toKeep->relid];
+
+ if (*markToKeep)
+ {
+ Assert((*markToKeep)->markType == (*markToRemove)->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, *markToKeep);
+ *markToKeep = NULL;
+ }
+ else
+ {
+ *markToKeep = *markToRemove;
+ *markToRemove = NULL;
+
+ /* Shouldn't have inheritance children here. */
+ Assert((*markToKeep)->rti == (*markToKeep)->prti);
+
+ (*markToKeep)->rti = toKeep->relid;
+ (*markToKeep)->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Likewise replace references in SpecialJoinInfo data structures.
+ *
+ * This is relevant in case the join we're deleting is nested inside some
+ * special joins: the upper joins' relid sets have to be adjusted.
+ */
+ foreach (cell, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(cell);
+
+ change_relid(&sjinfo->min_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->min_righthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_lefthand, toRemove->relid, toKeep->relid);
+ change_relid(&sjinfo->syn_righthand, toRemove->relid, toKeep->relid);
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Likewise update references in PlaceHolderVar data structures.
+ */
+ foreach(cell, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(cell);
+
+ /*
+ * We are at an inner join of two base relations. A placeholder can't
+ * be needed here or evaluated here.
+ */
+ Assert(!bms_is_subset(phinfo->ph_eval_at, joinrelids));
+ Assert(!bms_is_subset(phinfo->ph_needed, joinrelids));
+
+ change_relid(&phinfo->ph_eval_at, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_needed, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_lateral, toRemove->relid, toKeep->relid);
+ change_relid(&phinfo->ph_var->phrels, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ change_relid(&ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Mark the rel as "dead" to show it is no longer part of the join tree.
+ * (Removing it from the baserel array altogether seems too risky.)
+ */
+ toRemove->reloptkind = RELOPT_DEADREL;
+
+ /*
+ * Update references to the removed relation from other baserels.
+ */
+ for (i = 1; i < root->simple_rel_array_size; i++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[i];
+ int attroff;
+
+ /* no point in processing target rel itself */
+ if (i == toRemove->relid)
+ continue;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == i); /* sanity check on array */
+
+ /* Update attr_needed arrays. */
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0; attroff--)
+ {
+ change_relid(&otherrel->attr_needed[attroff], toRemove->relid,
+ toKeep->relid);
+ }
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+static bool
+tlist_contains_rel_exprs(PlannerInfo *root, Relids relids, RelOptInfo *rel)
+{
+ ListCell *vars;
+
+ foreach(vars, rel->reltarget->exprs)
+ {
+ Var *var = (Var *) lfirst(vars);
+ RelOptInfo *baserel;
+ int ndx;
+
+ /*
+ * Ignore PlaceHolderVars in the input tlists; we'll make our own
+ * decisions about whether to copy them.
+ */
+ if (IsA(var, PlaceHolderVar))
+ return true;
+
+ /*
+ * Otherwise, anything in a baserel or joinrel targetlist ought to be
+ * a Var. (More general cases can only appear in appendrel child
+ * rels, which will never be seen here.)
+ */
+ if (!IsA(var, Var))
+ elog(ERROR, "unexpected node type in rel targetlist: %d",
+ (int) nodeTag(var));
+
+ /* Get the Var's original base rel */
+ baserel = find_base_rel(root, var->varno);
+
+ /* Is it still needed above this joinrel? */
+ ndx = var->varattno - baserel->min_attr;
+ if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(UsjScratch *scratch, Index *relids, int n)
+{
+ PlannerInfo *root = scratch->root;
+ Relids joinrelids = scratch->joinrelids;
+ Relids result = NULL;
+ int i, o;
+
+ if (n < 2)
+ return NULL;
+
+ for (o = 0; o < n; o++)
+ {
+ RelOptInfo *outer = root->simple_rel_array[relids[o]];
+
+ for (i = o + 1; i < n; i++)
+ {
+ RelOptInfo *inner = root->simple_rel_array[relids[i]];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[relids[i]]->relid
+ == root->simple_rte_array[relids[o]]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(relids[i], info->syn_lefthand) &&
+ !bms_is_member(relids[o], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[i], info->syn_righthand) &&
+ !bms_is_member(relids[o], info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_lefthand) &&
+ !bms_is_member(relids[i], info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(relids[o], info->syn_righthand) &&
+ !bms_is_member(relids[i], info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, relids[o]);
+ joinrelids = bms_add_member(joinrelids, relids[i]);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * Have a chance to remove join if target list contains vars from
+ * the only one relation.
+ */
+ if (list_length(otherjoinquals) == 0)
+ {
+ /* Can't determine uniqueness without any quals. */
+ continue;
+
+ }
+ else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
+ {
+ /*
+ * TODO:
+ * In this case, we only have a chance in the case of a
+ * foreign key reference.
+ */
+ continue;
+ }
+ else
+ /*
+ * The target list contains vars from both inner and outer
+ * relations.
+ */
+ continue;
+ }
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (scratch->row_marks[relids[i]] && scratch->row_marks[relids[o]]
+ && scratch->row_marks[relids[i]]->markType
+ != scratch->row_marks[relids[o]]->markType)
+ {
+ continue;
+ }
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(scratch, joinrelids, restrictlist, inner, outer);
+ result = bms_add_member(result, relids[o]);
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, relids[o], relids[i]);
+ change_varno((Expr *) root->parse->havingQual, relids[o], relids[i]);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ scratch->joinrelids = joinrelids;
+ return result;
+}
+
+/*
+ * A qsort comparator to sort the relids by the relation Oid.
+ */
+static int
+compare_rte(const Index *left, const Index *right, PlannerInfo *root)
+{
+ Oid l = root->simple_rte_array[*left]->relid;
+ Oid r = root->simple_rte_array[*right]->relid;
+
+ return l < r ? 1 : (l == r ? 0 : -1);
+}
+
+/*
+ * Find and remove unique self joins on a particular level of the join tree.
+ *
+ * We sort the relations by Oid and then examine each group with the same Oid.
+ * If we removed any relation, remove it from joinlist as well.
+ */
+static Relids
+remove_self_joins_one_level(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+ Oid groupOid;
+ int groupStart;
+ int i;
+ int n = 0;
+ Index *relid_ascending = scratch->relids;
+ PlannerInfo *root = scratch->root;
+
+ /*
+ * Collect the ids of base relations at this level of the join tree.
+ */
+ foreach (lc, joinlist)
+ {
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ if (!IsA(ref, RangeTblRef))
+ continue;
+
+ rte = root->simple_rte_array[ref->rtindex];
+ rel = root->simple_rel_array[ref->rtindex];
+
+ /* We only care about base relations from which we select something. */
+ if (rte->rtekind != RTE_RELATION || rte->relkind != RELKIND_RELATION
+ || rel == NULL)
+ {
+ continue;
+ }
+
+ relid_ascending[n++] = ref->rtindex;
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (n > join_collapse_limit)
+ break;
+ }
+
+ if (n < 2)
+ return relidsToRemove;
+
+ /*
+ * Find and process the groups of relations that have same Oid.
+ */
+ qsort_arg(relid_ascending, n, sizeof(*relid_ascending),
+ (qsort_arg_comparator) compare_rte, root);
+ groupOid = root->simple_rte_array[relid_ascending[0]]->relid;
+ groupStart = 0;
+ for (i = 1; i < n; i++)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[relid_ascending[i]];
+ Assert(rte->relid != InvalidOid);
+ if (rte->relid != groupOid)
+ {
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ i - groupStart));
+ groupOid = rte->relid;
+ groupStart = i;
+ }
+ }
+ Assert(groupOid != InvalidOid);
+ Assert(groupStart < n);
+ relidsToRemove = bms_add_members(relidsToRemove,
+ remove_self_joins_one_group(scratch, &relid_ascending[groupStart],
+ n - groupStart));
+
+ return relidsToRemove;
+}
+
+/*
+ * Find and remove unique self joins on a single level of a join tree, and
+ * recurse to handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(UsjScratch *scratch, List *joinlist,
+ Relids relidsToRemove)
+{
+ ListCell *lc;
+
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ relidsToRemove = remove_self_joins_recurse(
+ scratch,
+ (List *) lfirst(lc),
+ relidsToRemove);
+ break;
+ case T_RangeTblRef:
+ break;
+ default:
+ Assert(false);
+ }
+ }
+ return remove_self_joins_one_level(scratch, joinlist, relidsToRemove);
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relation as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ ListCell *lc;
+ UsjScratch scratch;
+ Relids relidsToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ scratch.root = root;
+ scratch.relids = palloc(root->simple_rel_array_size * sizeof(Index));
+ scratch.row_marks = palloc0(root->simple_rel_array_size * sizeof(PlanRowMark *));
+ scratch.joinrelids = NULL;
+
+ /* Collect row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ /* Can't have more than one row mark for a relation. */
+ Assert(scratch.row_marks[rowMark->rti] == NULL);
+
+ scratch.row_marks[rowMark->rti] = rowMark;
+ }
+
+ /* Finally, remove the joins. */
+ relidsToRemove = remove_self_joins_recurse(&scratch,
+ joinlist,
+ relidsToRemove);
+ while ((relid = bms_next_member(relidsToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ pfree(scratch.relids);
+ pfree(scratch.row_marks);
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e1a13e20c5..444ebdb63c 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..4f831bb17c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -598,7 +594,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -703,7 +699,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1037,7 +1033,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1050,9 +1046,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1063,8 +1059,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1073,7 +1069,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1099,7 +1095,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1109,7 +1105,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..6baabd7273 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1111,6 +1111,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 23dec14cbd..4b7aedfe3e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -295,6 +295,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 777655210b..b237005408 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -106,6 +107,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 81b42c601b..576c08b4b2 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4774,6 +4774,337 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 81bdacf59d..9847097700 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -103,10 +103,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(18 rows)
+(19 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 9887fe0c0b..7168900bc8 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1705,6 +1705,172 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
I've built and tested this, and it seems to function correctly to me. One question I have is whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be clear, this is still an improvement (to me) without that.
Here's the simple test script I ran, on master ("before") and with the latest patch applied ("after").
CREATE TABLE users (id BIGINT PRIMARY KEY, nullable_int BIGINT UNIQUE, some_non_nullable_int BIGINT NOT NULL);
CREATE VIEW only_some_users AS (SELECT * FROM users WHERE id < 10);
CREATE VIEW some_other_users AS (SELECT * FROM users WHERE id > 3);
EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.id = u2.id;
-- before does HASH JOIN
-- after does seq scan with "id IS NOT NULL" condition
EXPLAIN SELECT *
FROM only_some_users
INNER JOIN some_other_users
ON only_some_users.id = some_other_users.id;
-- before does HASH JOIN
-- after does no JOIN, instead does scan, with an extra "id IS NOT NULL condition" (in addition to id < 10, id > 3)
EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.nullable_int = u2.nullable_int;
-- before does HASH JOIN
-- after does scan with (nullable_int IS NOT NULL) filter
EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.id = u2.nullable_int;
-- before does HASH JOIN
-- after correctly unchanged
EXPLAIN SELECT *
FROM users u1
INNER JOIN users u2
ON u1.id = u2.some_non_nullable_int
INNER JOIN users u3
ON u2.some_non_nullable_int = u3.id;
-- before does 2x HASH JOIN
-- now does 1x HASH JOIN, with a sequential scan over users filtered by id IS NOT NULL
On 12/3/21 14:05, Hywel Carver wrote:
I've built and tested this, and it seems to function correctly to me. One question I have is whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be clear, this is still an improvement (to me) without that.
New version of the feature. Deeply refactored with main goal - to reduce
the code size) and rebased on current master.
Here I didn't work on 'unnecessary IS NOT NULL filter'.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v26-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v26-0001-Remove-self-joins.patch; x-mac-creator=0; x-mac-type=0Download
From eee0c5f0de35d8cb83e6c4ca7749020acb18a4d1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Wed, 28 Apr 2021 18:27:53 +0500
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if can be proven that such
join can be replaced with a scan. We can build the required proofs of
uniqueness using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row, if:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars then the inner row
is (physically) same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Collect all another join quals.
3. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. Proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 942 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 331 ++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 166 ++++
12 files changed, 1511 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..dd5c4d2bd3 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,886 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ if (to == 0)
+ return;
+
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+static bool
+tlist_contains_rel_exprs(PlannerInfo *root, Relids relids, RelOptInfo *rel)
+{
+ ListCell *vars;
+
+ foreach(vars, rel->reltarget->exprs)
+ {
+ Var *var = (Var *) lfirst(vars);
+ RelOptInfo *baserel;
+ int ndx;
+
+ /*
+ * Ignore PlaceHolderVars in the input tlists; we'll make our own
+ * decisions about whether to copy them.
+ */
+ if (IsA(var, PlaceHolderVar))
+ return true;
+
+ /*
+ * Otherwise, anything in a baserel or joinrel targetlist ought to be
+ * a Var. (More general cases can only appear in appendrel child
+ * rels, which will never be seen here.)
+ */
+ if (!IsA(var, Var))
+ elog(ERROR, "unexpected node type in rel targetlist: %d",
+ (int) nodeTag(var));
+
+ /* Get the Var's original base rel */
+ baserel = find_base_rel(root, var->varno);
+
+ /* Is it still needed above this joinrel? */
+ ndx = var->varattno - baserel->min_attr;
+ if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
+ return true;
+ }
+ return false;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * Have a chance to remove join if target list contains vars from
+ * the only one relation.
+ */
+ if (list_length(otherjoinquals) == 0)
+ {
+ /* Can't determine uniqueness without any quals. */
+ continue;
+
+ }
+ else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
+ {
+ /*
+ * TODO:
+ * In this case, we only have a chance in the case of a
+ * foreign key reference.
+ */
+ continue;
+ }
+ else
+ /*
+ * The target list contains vars from both inner and outer
+ * relations.
+ */
+ continue;
+ }
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 273ac0acf7..28a55b0f42 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index 717808b037..3859b4843a 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e105a4d5f1..8aa8993749 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -599,7 +595,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -705,7 +701,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1056,7 +1052,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1069,9 +1065,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1082,8 +1078,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1092,7 +1088,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1118,7 +1114,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1128,7 +1124,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f2c7c2486b..b4d0a4ed9f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1156,6 +1156,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 53261ee91f..54d1a45fc5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index bf1adfc52a..629bda4b0b 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 86fd3907c5..7cd13c2b33 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4880,6 +4880,337 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b
+---+---
+ 2 | 1
+(1 row)
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0bb558d93c..dd005da156 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -112,10 +112,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_resultcache | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 7f866c603b..3d58a1b054 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1734,6 +1734,172 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int);
+insert into sj values (1, null), (null, 2), (2, 1);
+analyze sj;
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.31.1
On 5/8/21 2:00 AM, Hywel Carver wrote:
On Fri, May 7, 2021 at 8:23 AM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
Here I didn't work on 'unnecessary IS NOT NULL filter'.I've tested the new patch, and it is giving the same improved behaviour
as the old patch.
Thank you for this efforts!
I cleaned the code of previous version, improved regression tests and
rebased on current master.
Also, I see that we could do additional optimizations for an
EC-generated selfjoin clause (See equivclass.patch for necessary
changes). Example:
explain (costs off)
select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b =
t2.a;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj t2
Filter: ((a IS NOT NULL) AND (b = a) AND (a = b))
(2 rows)
But I'm not sure that this patch need to be a part of the self-join
removal feature because of code complexity.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
equivclass.patchtext/x-patch; charset=UTF-8; name=equivclass.patchDownload
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 6f1abbe47d..12a1d390b7 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1623,8 +1623,21 @@ generate_join_implied_equalities_normal(PlannerInfo *root,
EquivalenceMember *best_inner_em = NULL;
Oid best_eq_op = InvalidOid;
int best_score = -1;
+ int max_score = 3;
RestrictInfo *rinfo;
+ /* The case of possible self-join */
+ if (bms_num_members(outer_relids) == 1 &&
+ bms_num_members(inner_relids) == 1)
+ {
+ int orel = bms_next_member(outer_relids, -1);
+ int irel = bms_next_member(inner_relids, -1);
+
+ if (root->simple_rte_array[irel]->relid ==
+ root->simple_rte_array[orel]->relid)
+ max_score = 4;
+ }
+
foreach(lc1, outer_members)
{
EquivalenceMember *outer_em = (EquivalenceMember *) lfirst(lc1);
@@ -1653,17 +1666,31 @@ generate_join_implied_equalities_normal(PlannerInfo *root,
if (op_hashjoinable(eq_op,
exprType((Node *) outer_em->em_expr)))
score++;
+ if (score == 3 && score < max_score)
+ {
+ /* Look for self-join clause */
+ Var *outer_var = (Var *) (IsA(outer_em->em_expr, Var) ?
+ outer_em->em_expr :
+ ((RelabelType *) outer_em->em_expr)->arg);
+ Var *inner_var = (Var *) (IsA(inner_em->em_expr, Var) ?
+ inner_em->em_expr :
+ ((RelabelType *) inner_em->em_expr)->arg);
+
+ if (outer_var->varattno == inner_var->varattno)
+ score++;
+ }
+
if (score > best_score)
{
best_outer_em = outer_em;
best_inner_em = inner_em;
best_eq_op = eq_op;
best_score = score;
- if (best_score == 3)
+ if (best_score == max_score)
break; /* no need to look further */
}
}
- if (best_score == 3)
+ if (best_score == max_score)
break; /* no need to look further */
}
if (best_score < 0)
v27-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v27-0001-Remove-self-joins.patchDownload
From 836049b1467ded2f257ffe1844e5656b3f273d6c Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Wed, 28 Apr 2021 18:27:53 +0500
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can proof the uniqueness
using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 890 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 399 ++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 189 +++++
12 files changed, 1550 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..a8e638f6e7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,834 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 273ac0acf7..28a55b0f42 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index 717808b037..3859b4843a 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e105a4d5f1..8aa8993749 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -599,7 +595,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -705,7 +701,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1056,7 +1052,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1069,9 +1065,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1082,8 +1078,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1092,7 +1088,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1118,7 +1114,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1128,7 +1124,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ee731044b6..d02e47bdf7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1153,6 +1153,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 53261ee91f..54d1a45fc5 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index bf1adfc52a..629bda4b0b 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index fec0325e73..5fd05c4125 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4880,6 +4880,405 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0bb558d93c..dd005da156 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -112,10 +112,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_resultcache | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 7f866c603b..49e84522ef 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1734,6 +1734,195 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
Import Notes
Reply to msg id not found: CAFcA2FaVttiTStj2qQEuuSUY53ve7CM856EAWvbkXRrNAWeaQ@mail.gmail.com
On 12/3/21 12:05, Hywel Carver wrote:
I've built and tested this, and it seems to function correctly to me. One question I have is whether the added "IS NOT NULL" filters can be omitted when they're unnecessary. Some of the resulting plans included an "IS NOT NULL" filter on a non-nullable column. To be clear, this is still an improvement (to me) without that.
I think, here we could ask more general question: do we want to remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?
Right now we don't analyze list of clauses at all:
CREATE TABLE a (x int);
EXPLAIN (COSTS OFF)
SELECT * FROM a WHERE (x < 1) AND (X
IS NOT NULL) AND
(x < 1) AND (X IS NOT NULL);
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on a
Filter: ((x IS NOT NULL) AND (x IS NOT NULL) AND (x < 1) AND (x < 1))
And even worse:
EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM a WHERE (X IS NOT NULL) AND (X IS NULL);
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on public.a (cost=0.00..15.00 rows=87 width=4) (actual
time=0.136..0.136 rows=0 loops=1)
Output: x
Filter: ((a.x IS NOT NULL) AND (a.x IS NULL))
Rows Removed by Filter: 1000
It could reduce a number of selectivity mistakes, but increase CPU
consumption.
If we had such a clause analyzing machinery, we could trivially remove
this unneeded qual.
--
regards,
Andrey Lepikhov
Postgres Professional
On Wed, 2021-06-30 at 14:21 +0300, Andrey Lepikhov wrote:
I think, here we could ask more general question: do we want to remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM a WHERE (X IS NOT NULL) AND (X IS NULL);
QUERY PLAN----------------------------------------------------------------------------------------------------
Seq Scan on public.a (cost=0.00..15.00 rows=87 width=4) (actual time=0.136..0.136 rows=0 loops=1)
Output: x
Filter: ((a.x IS NOT NULL) AND (a.x IS NULL))
Rows Removed by Filter: 1000It could reduce a number of selectivity mistakes, but increase CPU
consumption.
If we had such a clause analyzing machinery, we could trivially remove
this unneeded qual.
On the other hand, sometimes something like that can be used to change
the optimizer's estimates to encourage certain plans.
We also don't optimize "ORDER BY x + 0" (which now you can use to prevent
an index scan) or the famous OFFSET 0, partly because it saves planning time,
partly because those can be useful tools.
Generally I have the impression that we are not too keen on spending
planning time on optimizing cases that can be trivially improved by rewriting
the query.
Yours,
Laurenz Albe
On 30/6/21 18:55, Laurenz Albe wrote:
On Wed, 2021-06-30 at 14:21 +0300, Andrey Lepikhov wrote:
I think, here we could ask more general question: do we want to remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?Generally I have the impression that we are not too keen on spending
planning time on optimizing cases that can be trivially improved by rewriting
the query.
Thank you, It's strong argument.
Such feature would look better as a part of an enterprise edition.
--
regards,
Andrey Lepikhov
Postgres Professional
Generally I have the impression that we are not too keen on spending
planning time on optimizing cases that can be trivially improved by
rewriting the query.
Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?
Regards,
--
Ronan Dunklau
On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:
On 12/3/21 12:05, Hywel Carver wrote:
I've built and tested this, and it seems to function correctly to me.
One question I have is whether the added "IS NOT NULL" filters can be
omitted when they're unnecessary. Some of the resulting plans included an
"IS NOT NULL" filter on a non-nullable column. To be clear, this is still
an improvement (to me) without that.
I think, here we could ask more general question: do we want to remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?
My suggestion was not to remove it, but to avoid adding it in the first
place. When your optimisation has found a join on a group of columns under
a uniqueness constraint, you would do something like this (forgive the
pseudo-code)
foreach(column, join_clause) {
if(column.nullable) { // This condition is what I'm suggesting is added
add_null_test(column, IS_NOT_NULL);
}
}
But it may be that that's not possible or practical at this point in the
code.
On 2/7/21 01:56, Hywel Carver wrote:
On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
I think, here we could ask more general question: do we want to
remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?My suggestion was not to remove it, but to avoid adding it in the first
place. When your optimisation has found a join on a group of columns
under a uniqueness constraint, you would do something like this (forgive
the pseudo-code)foreach(column, join_clause) {
if(column.nullable) { // This condition is what I'm suggesting is added
add_null_test(column, IS_NOT_NULL);
}
}But it may be that that's not possible or practical at this point in the
code.
I think, such option will require to implement a new machinery to prove
that arbitrary column couldn't produce NULL value.
--
regards,
Andrey Lepikhov
Postgres Professional
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:
On 2/7/21 01:56, Hywel Carver wrote:
On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
I think, here we could ask more general question: do we want to
remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?My suggestion was not to remove it, but to avoid adding it in the first
place. When your optimisation has found a join on a group of columns
under a uniqueness constraint, you would do something like this (forgive
the pseudo-code)foreach(column, join_clause) {
if(column.nullable) { // This condition is what I'm suggesting isadded
add_null_test(column, IS_NOT_NULL);
}
}But it may be that that's not possible or practical at this point in the
code.I think, such option will require to implement a new machinery to prove
that arbitrary column couldn't produce NULL value.
Got it, and it makes sense to me that this would be out of scope for this
change.
I remember in the previous conversation about this, Tomas acknowledged that
while there are some silly queries that would benefit from this change,
there are also some well-written ones (e.g. properly denormalised table
structures, with decomposed views that need joining together in some
queries). So the optimization needs to be essentially free to run to
minimise impact on other queries.
Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?
On Thu, May 27, 2021 at 12:21 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 5/8/21 2:00 AM, Hywel Carver wrote:
On Fri, May 7, 2021 at 8:23 AM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
Here I didn't work on 'unnecessary IS NOT NULL filter'.I've tested the new patch, and it is giving the same improved behaviour
as the old patch.Thank you for this efforts!
I cleaned the code of previous version, improved regression tests and
rebased on current master.Also, I see that we could do additional optimizations for an
EC-generated selfjoin clause (See equivclass.patch for necessary
changes). Example:
explain (costs off)
select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b =
t2.a;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj t2
Filter: ((a IS NOT NULL) AND (b = a) AND (a = b))
(2 rows)But I'm not sure that this patch need to be a part of the self-join
removal feature because of code complexity.
The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".
Regards,
Vignesh
On 6/7/21 13:49, Hywel Carver wrote:
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?
I think these tests weren't full coverage of possible use cases. It will
depend on a number of relations in the query. For the JOIN of
partitioned tables, for example, the overhead could grow. But in the
context of overall planning time this overhead will be small till the
large number of relations.
Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v28-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v28-0001-Remove-self-joins.patch; x-mac-creator=0; x-mac-type=0Download
From 6f9d11ec64b5b8e2304156deaea7842e0fd77c3e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can proof the uniqueness
using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 890 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 399 ++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 189 +++++
12 files changed, 1550 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..a8e638f6e7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,834 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+ int cc=0;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ {
+ *sources = list_delete_cell(*sources, cell);
+ cc++;
+ }
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *rjoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ rjoinquals = lappend(rjoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ rjoinquals = lappend(rjoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = rjoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+
+ if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+
+ if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 273ac0acf7..28a55b0f42 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index 717808b037..3859b4843a 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e105a4d5f1..8aa8993749 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -599,7 +595,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -705,7 +701,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1056,7 +1052,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1069,9 +1065,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1082,8 +1078,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1092,7 +1088,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1118,7 +1114,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1128,7 +1124,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a2e0f8de7e..587afbffaa 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1169,6 +1169,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f704d39980..694f5e8197 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index bf1adfc52a..629bda4b0b 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f3589d0dbb..0994952080 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4888,6 +4888,405 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 6e54f3e15e..d449d80c7a 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -112,10 +112,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index cb1c230914..59b6ea2a63 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1737,6 +1737,195 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.31.1
On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:
On 6/7/21 13:49, Hywel Carver wrote:
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?I think these tests weren't full coverage of possible use cases. It will
depend on a number of relations in the query. For the JOIN of
partitioned tables, for example, the overhead could grow. But in the
context of overall planning time this overhead will be small till the
large number of relations.
Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9--
regards,
Andrey Lepikhov
Postgres Professional
Hi,
bq. We can proof the uniqueness
proof -> prove
1. Collect all mergejoinable join quals looks like a.x = b.x
quals looks like -> quals which look like
For update_ec_sources(), the variable cc is not used.
Cheers
On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu <zyu@yugabyte.com> wrote:
On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov <a.lepikhov@postgrespro.ru>
wrote:On 6/7/21 13:49, Hywel Carver wrote:
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?I think these tests weren't full coverage of possible use cases. It will
depend on a number of relations in the query. For the JOIN of
partitioned tables, for example, the overhead could grow. But in the
context of overall planning time this overhead will be small till the
large number of relations.
Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9--
regards,
Andrey Lepikhov
Postgres ProfessionalHi,
bq. We can proof the uniqueness
proof -> prove
1. Collect all mergejoinable join quals looks like a.x = b.x
quals looks like -> quals which look like
For update_ec_sources(), the variable cc is not used.
Cheers
Hi,
+ *otherjoinquals = rjoinquals;
Maybe rename rjoinquals as ojoinquals to align with the target variable
name.
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
Naming k as kept, r as removed would make the code more readable (remain
starts with r but has opposite meaning).
+ if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
There are 4 if statements where jinfo_check is assigned false. Once
jinfo_check is assigned, we can break out of the loop - instead of checking
the remaining conditions.
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
nit: the 'else' is not needed since the if block above it goes to next
iteration of the loop.
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
It seems once imark and omark are set, we can come out of the loop.
Cheers
On 7/16/21 12:28 AM, Zhihong Yu wrote:
On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu <zyu@yugabyte.com
<mailto:zyu@yugabyte.com>> wrote:
bq. We can proof the uniquenessproof -> prove
Fixed
1. Collect all mergejoinable join quals looks like a.x = b.x
quals looks like -> quals which look like
For update_ec_sources(), the variable cc is not used.
Fixed
+ *otherjoinquals = rjoinquals;
Maybe rename rjoinquals as ojoinquals to align with the target variable
name.
Agree, fixed
+ int k; /* Index of kept relation */ + int r = -1; /* Index of removed relation */Naming k as kept, r as removed would make the code more readable (remain
starts with r but has opposite meaning).
I think it is correct now: k - index of inner (keeping) relation. r - of
outer (removing) relation.
+ if (bms_is_member(r, info->syn_righthand) && + !bms_is_member(k, info->syn_righthand)) + jinfo_check = false; + + if (!jinfo_check) + break;There are 4 if statements where jinfo_check is assigned false. Once
jinfo_check is assigned, we can break out of the loop - instead of
checking the remaining conditions.
Fixed
+ else if (!innerrel_is_unique(root, joinrelids, outer->relids,
nit: the 'else' is not needed since the if block above it goes to next
iteration of the loop.
Fixed
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)It seems once imark and omark are set, we can come out of the loop.
Maybe you right. fixed.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v29-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v29-0001-Remove-self-joins.patchDownload
From e8b4047aa71c808fa5799b2739b2ae0ab7b6d7e3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 886 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 399 ++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 189 +++++
12 files changed, 1546 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..eb9d83b424 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,830 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ if (bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 273ac0acf7..28a55b0f42 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index 717808b037..3859b4843a 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e105a4d5f1..8aa8993749 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -599,7 +595,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -705,7 +701,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1056,7 +1052,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1069,9 +1065,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1082,8 +1078,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1092,7 +1088,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1118,7 +1114,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1128,7 +1124,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a2e0f8de7e..587afbffaa 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1169,6 +1169,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f704d39980..694f5e8197 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index bf1adfc52a..629bda4b0b 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index f3589d0dbb..0994952080 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4888,6 +4888,405 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 6e54f3e15e..d449d80c7a 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -112,10 +112,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index cb1c230914..59b6ea2a63 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1737,6 +1737,195 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?
Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.
My question is whether it handles cases like:
select b.x,c.y
from t
join t2 as b on (b.id = t.id)
join t2 as c on (c.id = t.id)
That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?
That's the usual pattern I've seen this arise. Not so much that people
write self joins explicitly but that they add a join to check some
column but that is happening in some isolated piece of code that
doesn't know that that join is already in the query. You can easily
end up with a lot of joins against the same table this way.
It's not far different from the old chestnut
select (select x from t2 where id = t.id) as x,
(select y from t2 where id = t.id) as y
from t
which is actually pretty hard to avoid sometimes.
--
greg
On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
On 6/7/21 13:49, Hywel Carver wrote:
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?I think these tests weren't full coverage of possible use cases. It will
depend on a number of relations in the query. For the JOIN of partitioned
tables, for example, the overhead could grow. But in the context of overall
planning time this overhead will be small till the large number of
relations.
Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9
I made some tests in a machine with 16 cores and 32GB of RAM.
So we can see if this is an improvement.
This is what I found:
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| test | mode | master | enabled | % | disabled | % |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read only | standard | 64418.13 | 63942.94 | -0.74 | 62231.38 | -3.39 |
| pgbench read only | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 | -6.92 |
| pgbench read only | extended | 55409.65 | 56427.63 | 1.84 | 55927.62 | 0.93 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read/write | standard | 9374.91 | 9135.21 | -2.56 | 8840.68 | -5.70 |
| pgbench read/write | prepared | 11849.86 | 11672.23 | -1.50 | 11393.39 | -3.85 |
| pgbench read/write | extended | 7976.80 | 7947.07 | -0.37 | 7788.99 | -2.35 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select non optimize 1 | standard | 80.97 | 81.29 | 0.40 | 81.30 | 0.41 |
| select non optimize 1 | prepared | 81.29 | 81.28 | -0.01 | 80.89 | -0.49 |
| select non optimize 1 | extended | 81.07 | 80.81 | -0.32 | 80.98 | -0.11 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 1 | standard | 15.84 | 13.90 |-12.25 | 15.80 | -0.25 |
| select optimized 1 | prepared | 15.24 | 13.82 | -9.32 | 15.55 | 2.03 |
| select optimized 1 | extended | 15.38 | 13.89 | -9.69 | 15.59 | 1.37 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 2 | standard | 10204.91 | 10818.39 | 6.01 | 10261.07 | 0.55 |
| select optimized 2 | prepared | 13284.06 | 15579.33 | 17.28 | 13116.22 | -1.26 |
| select optimized 2 | extended | 10143.43 | 10645.23 | 4.95 | 10142.77 | -0.01 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select shoe | standard | 5645.28 | 5661.71 | 0.29 | 6180.60 | 9.48 |
| select shoe | prepared | 9660.45 | 9602.37 | -0.60 | 9894.82 | 2.43 |
| select shoe | extended | 5666.47 | 5634.10 | -0.57 | 5757.26 | 1.60 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
Obviously the pgbench runs are from the standard script. The numbers are
not clear for me, I can see improvementes with the patch only in one
case and, for some reason, if I disable the patch
(enable_self_join_removal='off') I still see a regression in normal
cases and curiosly an improvement in one case.
I'm attaching the queries. I used the users table that is down-thread
and loaded with ~200k rows using:
insert into users
select seq, case when random() < 0.2 then null else random() * 1000 end,
random() * 10000
from generate_series(1, 1000000) seq
on conflict (nullable_int) do nothing;
for master I just dumped the data from the table and loaded it. I'm also
attaching the queries I used.
After this tests, I'm not convinced this is actually providing something
performance-wise. At least not in its current state.
--
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
I don't think the benchmarking that's needed is to check whether
pruning unnecessary joins is helpful. Obviously it's going to be hard
to measure on simple queries and small tables. But the resulting plan
is unambiguously superior and in more complex cases could extra i/o.
The benchmarking people were looking for in the past was testing the
impact of the extra planning work in cases where it doesn't end up
being applied. I'm not sure what the worst case is, perhaps a many-way
self-join where the join clauses are not suitable for pruning?
On 1/3/2022 03:03, Greg Stark wrote:
On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.My question is whether it handles cases like:
select b.x,c.y
from t
join t2 as b on (b.id = t.id)
join t2 as c on (c.id = t.id)That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?
Yes, the self-join removal machinery uses EC mechanism as usual to get
all join clauses. So, this case works (See demo in attachment).
Also, in new version of the patch I fixed one stupid bug: checking a
self-join candidate expression operator - we can remove only expressions
like F(arg1) = G(arg2).
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v30-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v30-0001-Remove-self-joins.patchDownload
From 70398361a0a0d9c6c3c7ddd1fd305ac11138e7b1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 888 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 426 +++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 197 +++++
12 files changed, 1583 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,832 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index c92ddd27ed..8f6d5e5639 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index d4cffdb198..dd3e323f21 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..2e73cacc0d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -600,7 +596,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -707,7 +703,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1059,7 +1055,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1072,9 +1068,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1085,8 +1081,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1095,7 +1091,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1121,7 +1117,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1131,7 +1127,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1e3650184b..1b674d5987 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1178,6 +1178,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 620eeda2d6..1effa9721a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -307,6 +307,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 54a0d4c188..054b77e087 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 19caebabd0..4e9fd2367c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4922,6 +4922,432 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 442eeb1e3f..fc5be62856 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -119,10 +119,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6dd01b022e..a7c09d2573 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1753,6 +1753,203 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
Hi,
On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:
Also, in new version of the patch I fixed one stupid bug: checking a
self-join candidate expression operator - we can remove only expressions
like F(arg1) = G(arg2).
This CF entry currently fails tests: https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938
Looks like you're missing an adjustment of postgresql.conf.sample
Marked as waiting-on-author.
Greetings,
Andres Freund
On 3/22/22 05:58, Andres Freund wrote:
Hi,
On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:
Also, in new version of the patch I fixed one stupid bug: checking a
self-join candidate expression operator - we can remove only expressions
like F(arg1) = G(arg2).This CF entry currently fails tests: https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938
Looks like you're missing an adjustment of postgresql.conf.sample
Marked as waiting-on-author.
Thanks, I fixed it.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v31-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v31-0001-Remove-self-joins.patchDownload
From 620dea31ce19965beefe545f08dcc5c8b319c434 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 888 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 426 +++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 197 +++++
12 files changed, 1583 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,832 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index c92ddd27ed..8f6d5e5639 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index d4cffdb198..dd3e323f21 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..2e73cacc0d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -600,7 +596,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -707,7 +703,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1059,7 +1055,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1072,9 +1068,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1085,8 +1081,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1095,7 +1091,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1121,7 +1117,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1131,7 +1127,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f70f7f5c01..b1ed9ebebf 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1182,6 +1182,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 620eeda2d6..1effa9721a 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -307,6 +307,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 54a0d4c188..054b77e087 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 19caebabd0..4e9fd2367c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4922,6 +4922,432 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 442eeb1e3f..fc5be62856 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -119,10 +119,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6dd01b022e..a7c09d2573 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1753,6 +1753,203 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
Sigh. And now there's a patch conflict in a regression test expected
output: sysviews.out
Please rebase. Incidentally, make sure to check the expected output is
actually correct. It's easy to "fix" an expected output to
accidentally just memorialize an incorrect output.
Btw, it's the last week before feature freeze so time is of the essence.
On 4/1/22 20:27, Greg Stark wrote:
Sigh. And now there's a patch conflict in a regression test expected
output: sysviews.outPlease rebase. Incidentally, make sure to check the expected output is
actually correct. It's easy to "fix" an expected output to
accidentally just memorialize an incorrect output.Btw, it's the last week before feature freeze so time is of the essence.Thanks,
patch in attachment rebased on current master.
Sorry for late answer.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v32-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v32-0001-Remove-self-joins.patchDownload
From d5fab52bd7e7124d0e557f1eec075a9543c67d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 888 +++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/planmain.h | 2 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 426 +++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 197 +++++
12 files changed, 1583 insertions(+), 29 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +92,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +357,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +376,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +402,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +448,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +464,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -1118,3 +1151,832 @@ is_innerrel_unique_for(PlannerInfo *root,
/* Let rel_is_distinct_for() do the hard work */
return rel_is_distinct_for(root, innerrel, clause_list);
}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, kmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(kmark->rti == kmark->prti);
+
+ rmark->rti = toKeep->relid;
+ rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
+}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index c92ddd27ed..8f6d5e5639 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index d4cffdb198..dd3e323f21 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..2e73cacc0d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -600,7 +596,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -707,7 +703,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1059,7 +1055,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1072,9 +1068,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1085,8 +1081,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1095,7 +1091,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1121,7 +1117,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1131,7 +1127,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9e8ab1420d..323d8dd866 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1192,6 +1192,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 6eca547af8..3aa4ca5fb7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -307,6 +307,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 54a0d4c188..054b77e087 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern double cursor_tuple_fraction;
+extern bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,7 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index bf1a2db2cf..a937fa7ca5 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4924,6 +4924,432 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sl t2
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 4e775af175..c734d5eca7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6dd01b022e..a7c09d2573 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1753,6 +1753,203 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.25.1
New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?
--
Regards
Andrey Lepikhov
Postgres Professional
Attachments:
v33-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v33-0001-Remove-self-joins.patchDownload
From bd84f5b4ee7554be9ead1b1886ed6b9f0f43a5d4 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plane table to itself
in an query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/path/indxpath.c | 31 +
src/backend/optimizer/plan/analyzejoins.c | 1024 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 648 +++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 286 ++++++
14 files changed, 2056 insertions(+), 41 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..df6d706770 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Used as the relation_has_unique_index_for, but if extra_clauses doesn't NULL,
+ * return baserestrictinfo clauses which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
+ List *exprs;
Assert(list_length(exprlist) == list_length(oprlist));
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ exprs = NIL;
+
/*
* If the index is not unique, or not immediately enforced, or if it's
* a partial index that doesn't match the query, it's useless here.
@@ -3600,7 +3618,16 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+
matched = true; /* column is unique */
+
+ /* Store a filter for subsequent operations. */
+ if (bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids))
+ exprs = lappend(exprs, rinfo);
+
+ MemoryContextSwitchTo(oldContext);
break;
}
}
@@ -3643,7 +3670,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..68edc9d2d7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,40 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +106,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +307,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +320,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +332,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +371,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +390,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +416,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +462,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +478,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -636,9 +683,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -651,10 +703,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -968,9 +1021,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -992,10 +1057,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +1081,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1094,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1074,7 +1148,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1116,5 +1191,904 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /* XXX: Why here we haven't any differences in removing rmark or kmark? */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return the only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+degenerate_case(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Const *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ /* Only filters like x == const we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) (rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause));
+ c1 = (Const *) (rinfo->outer_is_left ?
+ get_leftop(rinfo->clause) : get_rightop(rinfo->clause));
+ Assert(IsA(c1, Const) && outer->relid > 0 && relid > 0);
+
+ clause = (Expr *) copyObject(clause);
+ change_varno(clause, relid, outer->relid);
+
+ /*
+ * Compare these left and right sides with corresponding sides of
+ * the outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Expr *oclause;
+ Expr *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(clause, oclause) && equal(c1,c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_righthand) &&
+ !bms_is_member(k, info->syn_righthand))
+ jinfo_check = false;
+
+ if (!jinfo_check)
+ break;
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ if (!degenerate_case(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index c92ddd27ed..8f6d5e5639 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index d4cffdb198..dd3e323f21 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..2e73cacc0d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -600,7 +596,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -707,7 +703,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1059,7 +1055,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1072,9 +1068,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1085,8 +1081,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1095,7 +1091,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1121,7 +1117,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1131,7 +1127,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8e9b71375c..d2bdffe373 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1224,6 +1224,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d2d46b15df..c71e2549a8 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -309,6 +309,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 3d95e6bfc8..b93bc29f0f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -76,6 +76,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c4f61c1a09..ef963e4bcf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index bf1a2db2cf..ae0bec82ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4924,6 +4924,654 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no columns
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no columns
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 4e775af175..c734d5eca7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 6dd01b022e..d132eedf18 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1753,6 +1753,292 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no columns
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no columns
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.34.1
delta.txttext/plain; charset=UTF-8; name=delta.txtDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..df6d706770 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Used as the relation_has_unique_index_for, but if extra_clauses doesn't NULL,
+ * return baserestrictinfo clauses which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
+ List *exprs;
Assert(list_length(exprlist) == list_length(oprlist));
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ exprs = NIL;
+
/*
* If the index is not unique, or not immediately enforced, or if it's
* a partial index that doesn't match the query, it's useless here.
@@ -3600,7 +3618,16 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+
matched = true; /* column is unique */
+
+ /* Store a filter for subsequent operations. */
+ if (bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids))
+ exprs = lappend(exprs, rinfo);
+
+ MemoryContextSwitchTo(oldContext);
break;
}
}
@@ -3643,7 +3670,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bdd9effd38..68edc9d2d7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -33,6 +33,19 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
bool enable_self_join_removal;
/* local functions */
@@ -42,14 +55,15 @@ static void remove_rel_from_query(PlannerInfo *root, int relid,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
@@ -293,7 +307,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -669,9 +683,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -684,10 +703,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1001,9 +1021,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1025,10 +1057,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1045,7 +1081,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1058,10 +1094,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1107,7 +1148,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1149,7 +1191,7 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
}
typedef struct ChangeVarnoContext
@@ -1395,6 +1437,7 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
if (leftOp != NULL && equal(leftOp, rightOp))
{
NullTest *nullTest = makeNode(NullTest);
+
nullTest->arg = leftOp;
nullTest->nulltesttype = IS_NOT_NULL;
nullTest->argisrow = false;
@@ -1638,6 +1681,70 @@ split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
*otherjoinquals = ojoinquals;
}
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return the only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+degenerate_case(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Const *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ /* Only filters like x == const we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) (rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause));
+ c1 = (Const *) (rinfo->outer_is_left ?
+ get_leftop(rinfo->clause) : get_rightop(rinfo->clause));
+ Assert(IsA(c1, Const) && outer->relid > 0 && relid > 0);
+
+ clause = (Expr *) copyObject(clause);
+ change_varno(clause, relid, outer->relid);
+
+ /*
+ * Compare these left and right sides with corresponding sides of
+ * the outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Expr *oclause;
+ Expr *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(clause, oclause) && equal(c1,c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
/*
* Find and remove unique self joins in a group of base relations that have
* the same Oid.
@@ -1663,13 +1770,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
while ((k = bms_next_member(relids, k)) > 0)
{
RelOptInfo *inner = root->simple_rel_array[k];
- List *restrictlist;
- List *selfjoinquals;
- List *otherjoinquals;
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
ListCell *lc;
bool jinfo_check = true;
PlanRowMark *omark = NULL;
PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[k]->relid ==
@@ -1745,9 +1853,13 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
* join quals are selfjoin quals. Otherwise we could end up
* putting false negatives in the cache.
*/
- if (!innerrel_is_unique(root, joinrelids, outer->relids,
- inner, JOIN_INNER, selfjoinquals,
- list_length(otherjoinquals) == 0))
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ if (!degenerate_case(outer, uclauses, inner->relid))
continue;
/* See for row marks. */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 3d95e6bfc8..b93bc29f0f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -76,6 +76,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 86247cb9ce..ef963e4bcf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -108,6 +108,10 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9d0674285c..ae0bec82ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5099,11 +5099,165 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no columns
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no columns
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
-- Test that OR predicated are updated correctly after join removal
CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
explain (costs off)
-SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate
@@ -5240,17 +5394,51 @@ SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up
-- moving quals over to s1 to make it so it can't match any rows.
-create table sl(a int, b int);
+create table sl(a int, b int, c int);
create unique index on sl(a, b);
vacuum analyze sl;
-- Both sides are unique, but base quals are different
explain (costs off)
select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
- QUERY PLAN
------------------------------------------------------
- Seq Scan on sl t2
- Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
-(2 rows)
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
-- Join qual isn't mergejoinable, but inner is unique.
explain (COSTS OFF)
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 53d0aa98eb..d132eedf18 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1829,11 +1829,69 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no columns
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no columns
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
-- Test that OR predicated are updated correctly after join removal
CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
explain (costs off)
-SELECT COUNT(*) FROM tab_with_flag WHERE (is_flag IS NULL OR is_flag = 0) AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
DROP TABLE tab_with_flag;
-- HAVING clause
@@ -1886,7 +1944,7 @@ SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up
-- moving quals over to s1 to make it so it can't match any rows.
-create table sl(a int, b int);
+create table sl(a int, b int, c int);
create unique index on sl(a, b);
vacuum analyze sl;
@@ -1894,6 +1952,20 @@ vacuum analyze sl;
explain (costs off)
select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
-- Join qual isn't mergejoinable, but inner is unique.
explain (COSTS OFF)
SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
Hi,
w.r.t. v33-0001-Remove-self-joins.patch :
removes inner join of plane table -> removes inner join of plain table
in an query plan -> in a query plan
+ * Used as the relation_has_unique_index_for,
Since relation_has_unique_index_for() becomes a wrapper of
relation_has_unique_index_ext, the above sentence doesn't make much sense.
I think you can drop this part.
but if extra_clauses doesn't NULL -> If extra_clauses isn't NULL
+ is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids) ? true : false;
The above can be simplified to:
is_req_equal = rinfo->required_relids == rinfo->clause_relids;
+ ListCell *otherCell;
otherCell should be initialized to NULL.
+ if (bms_is_member(k, info->syn_lefthand) &&
+ !bms_is_member(r, info->syn_lefthand))
+ jinfo_check = false;
+ else if (bms_is_member(k, info->syn_righthand) &&
+ !bms_is_member(r, info->syn_righthand))
+ jinfo_check = false;
+ else if (bms_is_member(r, info->syn_lefthand) &&
+ !bms_is_member(k, info->syn_lefthand))
+ jinfo_check = false;
I think the above code can be simplified:
If bms_is_member(k, info->syn_lefthand) ^ bms_is_member(r,
info->syn_lefthand) is true, jinfo_check is false.
If bms_is_member(k, info->syn_righthand) ^ bms_is_member(r,
info->syn_righthand) is true, jinfo_check is false.
Otherwise jinfo_check is true.
Cheers
Import Notes
Resolved by subject fallback
Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?
Hello,
I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.
The worst case scenario I could think of was as follows:
- a query with many different self joins
- an abundance of unique indexes on combinations of this table columns to
consider
- additional predicates on the where clause on columns.
The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.
I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:
SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
So no self join can be eliminated in that case.
The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.
One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.
As for the code:
- Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
- The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?
The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.
I'll update if I think of more interesting things to add.
--
Ronan Dunklau
On 5/17/22 19:14, Ronan Dunklau wrote:
Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?Hello,
I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.The worst case scenario I could think of was as follows:
- a query with many different self joins
- an abundance of unique indexes on combinations of this table columns to
consider
- additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is
when a set of relations with the same OID is huge, but only small part
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class
adds non-linear complexity. So, you can dig this way ).
The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join.
Cross-unique-index proof isn't supported now.
The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.
Even more - removing a join we improve cardinality estimation.
As for the code:
- Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
- The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.
Agree,
but after this case thoughts wander in my head: should we make one step
back to pre-[1]/messages/by-id/raw/CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com approach? It looks like we have quite similar changes,
but without special function for a 'degenerate case' detection and
restrictlist splitting.
I'll update if I think of more interesting things to add.
Thank you for your efforts!
See in attachment next version which fixes mistakes detected by
zyu@yugabyte.com.
[1]: /messages/by-id/raw/CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com
/messages/by-id/raw/CAApHDvpggnFMC4yP-jUO7PKN=fXeErW5bOxisvJ0HvkHQEY=Ww@mail.gmail.com
--
Regards
Andrey Lepikhov
Postgres Professional
Attachments:
v34-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v34-0001-Remove-self-joins.patchDownload
From 1774b4bcfe337b151b76c7f357dc6748755bf1d9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
src/backend/optimizer/path/indxpath.c | 31 +
src/backend/optimizer/plan/analyzejoins.c | 1016 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/util/joininfo.c | 3 +
src/backend/optimizer/util/relnode.c | 26 +-
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 648 +++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 286 ++++++
14 files changed, 2048 insertions(+), 41 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..3690f7fead 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
+ List *exprs;
Assert(list_length(exprlist) == list_length(oprlist));
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ exprs = NIL;
+
/*
* If the index is not unique, or not immediately enforced, or if it's
* a partial index that doesn't match the query, it's useless here.
@@ -3600,7 +3618,16 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
+ MemoryContext oldContext = MemoryContextSwitchTo(root->planner_cxt);
+
matched = true; /* column is unique */
+
+ /* Store a filter for subsequent operations. */
+ if (bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids))
+ exprs = lappend(exprs, rinfo);
+
+ MemoryContextSwitchTo(oldContext);
break;
}
}
@@ -3643,7 +3670,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..a78ba3daeb 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,40 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +106,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +307,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +320,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +332,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +371,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr*)otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +390,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr*)sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +416,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -418,6 +462,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -433,6 +478,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -636,9 +683,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -651,10 +703,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -968,9 +1021,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -992,10 +1057,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +1081,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1094,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1074,7 +1148,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1116,5 +1191,896 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var* var = (Var*)node;
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo*)node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset*
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo* rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members); )
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it
+ * is, delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources); )
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass mentions
+ * of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach (cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /* XXX: Why here we haven't any differences in removing rmark or kmark? */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch(rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr*)rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr*)rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr*)rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return the only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+degenerate_case(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Const *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ /* Only filters like x == const we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) (rinfo->outer_is_left ?
+ get_rightop(rinfo->clause) : get_leftop(rinfo->clause));
+ c1 = (Const *) (rinfo->outer_is_left ?
+ get_leftop(rinfo->clause) : get_rightop(rinfo->clause));
+ Assert(IsA(c1, Const) && outer->relid > 0 && relid > 0);
+
+ clause = (Expr *) copyObject(clause);
+ change_varno(clause, relid, outer->relid);
+
+ /*
+ * Compare these left and right sides with corresponding sides of
+ * the outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Expr *oclause;
+ Expr *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(clause, oclause) && equal(c1,c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /* Is it a unique self join? */
+ restrictlist = build_joinrel_restrictlist(root, joinrelids, outer,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ if (!degenerate_case(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach (lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Add join restrictions to joininfo of removing relation to simplify
+ * the relids replacing procedure.
+ */
+ outer->joininfo = list_concat(outer->joininfo, restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach (lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove
+ * unnecessary range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index c92ddd27ed..8f6d5e5639 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/util/joininfo.c b/src/backend/optimizer/util/joininfo.c
index d4cffdb198..dd3e323f21 100644
--- a/src/backend/optimizer/util/joininfo.c
+++ b/src/backend/optimizer/util/joininfo.c
@@ -130,6 +130,9 @@ remove_join_clause_from_rels(PlannerInfo *root,
{
RelOptInfo *rel = find_base_rel(root, cur_relid);
+ if (!list_member_ptr(rel->joininfo, restrictinfo))
+ continue;
+
/*
* Remove the restrictinfo from the list. Pointer comparison is
* sufficient.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..2e73cacc0d 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -40,14 +40,10 @@ typedef struct JoinHashEntry
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel);
-static List *build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
- RelOptInfo *outer_rel,
- RelOptInfo *inner_rel);
static void build_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel);
-static List *subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+static List *subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist);
static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
@@ -600,7 +596,7 @@ build_join_rel(PlannerInfo *root,
*/
if (restrictlist_ptr)
*restrictlist_ptr = build_joinrel_restrictlist(root,
- joinrel,
+ joinrel->relids,
outer_rel,
inner_rel);
return joinrel;
@@ -707,7 +703,7 @@ build_join_rel(PlannerInfo *root,
* caller might or might not need the restrictlist, but I need it anyway
* for set_joinrel_size_estimates().)
*/
- restrictlist = build_joinrel_restrictlist(root, joinrel,
+ restrictlist = build_joinrel_restrictlist(root, joinrel->relids,
outer_rel, inner_rel);
if (restrictlist_ptr)
*restrictlist_ptr = restrictlist;
@@ -1059,7 +1055,7 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* the various joinlist entries ultimately refer to RestrictInfos
* pushed into them by distribute_restrictinfo_to_rels().
*
- * 'joinrel' is a join relation node
+ * 'joinrelids' is a join relation id set
* 'outer_rel' and 'inner_rel' are a pair of relations that can be joined
* to form joinrel.
*
@@ -1072,9 +1068,9 @@ build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
* RestrictInfo nodes are no longer context-dependent. Instead, just include
* the original nodes in the lists made for the join relation.
*/
-static List *
+List *
build_joinrel_restrictlist(PlannerInfo *root,
- RelOptInfo *joinrel,
+ Relids joinrelids,
RelOptInfo *outer_rel,
RelOptInfo *inner_rel)
{
@@ -1085,8 +1081,8 @@ build_joinrel_restrictlist(PlannerInfo *root,
* eliminating any duplicates (important since we will see many of the
* same clauses arriving from both input relations).
*/
- result = subbuild_joinrel_restrictlist(joinrel, outer_rel->joininfo, NIL);
- result = subbuild_joinrel_restrictlist(joinrel, inner_rel->joininfo, result);
+ result = subbuild_joinrel_restrictlist(joinrelids, outer_rel->joininfo, NIL);
+ result = subbuild_joinrel_restrictlist(joinrelids, inner_rel->joininfo, result);
/*
* Add on any clauses derived from EquivalenceClasses. These cannot be
@@ -1095,7 +1091,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
*/
result = list_concat(result,
generate_join_implied_equalities(root,
- joinrel->relids,
+ joinrelids,
outer_rel->relids,
inner_rel));
@@ -1121,7 +1117,7 @@ build_joinrel_joinlist(RelOptInfo *joinrel,
}
static List *
-subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
+subbuild_joinrel_restrictlist(Relids joinrelids,
List *joininfo_list,
List *new_restrictlist)
{
@@ -1131,7 +1127,7 @@ subbuild_joinrel_restrictlist(RelOptInfo *joinrel,
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
- if (bms_is_subset(rinfo->required_relids, joinrel->relids))
+ if (bms_is_subset(rinfo->required_relids, joinrelids))
{
/*
* This clause becomes a restriction clause for the joinrel, since
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8e9b71375c..d2bdffe373 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1224,6 +1224,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d2d46b15df..c71e2549a8 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -309,6 +309,10 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo,
List **restrictlist_ptr);
+extern List *build_joinrel_restrictlist(PlannerInfo *root,
+ Relids joinrelids,
+ RelOptInfo *outer_rel,
+ RelOptInfo *inner_rel);
extern Relids min_join_parameterization(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index b6e137cf83..d0da17ea07 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -76,6 +76,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index c4f61c1a09..ef963e4bcf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List * remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2538bd6a79..1d7150cf1e 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4918,6 +4918,654 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no columns
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no columns
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 4e775af175..c734d5eca7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a27a72086e..cca31896d0 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1753,6 +1753,292 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no columns
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no columns
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
--
2.34.1
Le jeudi 19 mai 2022, 12:48:18 CEST Andrey Lepikhov a écrit :
On 5/17/22 19:14, Ronan Dunklau wrote:
Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?Hello,
I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.The worst case scenario I could think of was as follows:
- a query with many different self joins
- an abundance of unique indexes on combinations of this table columns
toconsider
- additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is
when a set of relations with the same OID is huge, but only small part
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class
adds non-linear complexity. So, you can dig this way ).The base table I used for this was a table with 40 integers. 39 unique
indexes were defined on every combination of (c1, cX) with cX being
columns c2 to c40.I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join.
Cross-unique-index proof isn't supported now.
Yes, that's the point. I wanted to try to introduce as much complexity as I
could, without actually performing any self join elimination. The idea was to
try to come up with the worst case scenario.
The performance was very similar with or without the GUC enabled. I tested
the same thing without the patch, since the test for uniqueness has been
slightly altered and incurs a new allocation, but it doesn't seem to
change.One interesting side effect of this patch, is that removing any unneeded
self join cuts down the planification time very significantly, as we
lower the number of combinations to consider.Even more - removing a join we improve cardinality estimation.
As for the code:
- Comments on relation_has_unique_index_ext and
relation_has_unique_index_for>
should be rewritten, as relation_has_unique_index_for is now just a
special
case of relation_has_unique_index_ext. By the way, the comment would
probably be better read as: "but if extra_clauses isn't NULL".- The whole thing about "extra_clauses", ie, baserestrictinfos which
wereused to determine uniqueness, is not very clear. Most functions where the
new argument has been added have not seen an update in their comments,
and the name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?The way this works should be explained a bit more thoroughly, for example
in remove_self_joins_one_group the purpose of uclauses should be
explained. The fact that degenerate_case returns true when we don't have
any additional base restrict info is also confusing, as well as the
degenerate_case name.Agree,
but after this case thoughts wander in my head: should we make one step
back to pre-[1] approach? It looks like we have quite similar changes,
but without special function for a 'degenerate case' detection and
restrictlist splitting.
I'll take a look at that one.
I'll update if I think of more interesting things to add.
Thank you for your efforts!
See in attachment next version which fixes mistakes detected by
zyu@yugabyte.com.[1]
/messages/by-id/raw/CAApHDvpggnFMC4yP-jUO7PKN=fXeErW
5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
--
Ronan Dunklau
On 19/5/2022 16:47, Ronan Dunklau wrote:
I'll take a look at that one.
New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v35-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v35-0001-Remove-self-joins.patchDownload
From 9ce71f1d0ffefa9d77edfa30fc189bc0425ebbbe Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1045 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 686 ++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 305 ++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2142 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 48478b1024..2226117c62 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5287,6 +5287,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..2eb05c79ce 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
+ List *exprs;
Assert(list_length(exprlist) == list_length(oprlist));
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ exprs = NIL;
+
/*
* If the index is not unique, or not immediately enforced, or if it's
* a partial index that doesn't match the query, it's useless here.
@@ -3601,6 +3619,23 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3643,7 +3678,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..b44526b1a1 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,40 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo *rinfo, Index from, Index to);
+static Bitmapset *change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +106,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +307,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +320,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +332,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +371,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr *) otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +390,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr *) sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,16 +416,29 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -419,6 +463,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ /*
+ * replace relid with relid of keeping relation. After that, this
+ * restriction can be attached to keeping relation.
+ */
+ change_rinfo(rinfo, relid, subst_relid);
+
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
/* Recheck that qual doesn't actually reference the target rel */
@@ -433,6 +483,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -636,9 +688,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -651,10 +708,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -968,9 +1026,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -992,10 +1062,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1012,7 +1086,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1025,10 +1099,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1074,7 +1153,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1116,5 +1196,920 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo *) node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset *
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo *rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members);)
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it is,
+ * delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass
+ * mentions of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach(cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /*
+ * XXX: Why here we haven't any differences in removing rmark or
+ * kmark?
+ */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch (rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr *) rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr *) rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr *) rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Expr *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like x == const we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ change_varno(clause, relid, outer->relid);
+
+ c1 = (Expr *) (rinfo->outer_is_left ?
+ get_leftop(clause) : get_rightop(clause));
+ clause = (Expr *) (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Expr *oclause;
+ Expr *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (Expr *) (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(clause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /* Is it an unique self join? */
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX: we would detect self-join without quals like 'x==x' if
+ * we had an foreign key constraint on some of other quals and
+ * this join haven't any columns from the outer in the target
+ * list. But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of join.
+ */
+ if (!match_unique_clauses(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Attach the restrictions to removing relation to simplify relids
+ * replacing procedure. joininfo clauses of removing relation will
+ * be moved to the keeping by the remove_rel_from_query().
+ */
+ outer->baserestrictinfo = list_concat(outer->baserestrictinfo,
+ restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach(lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index c92ddd27ed..8f6d5e5639 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -226,6 +226,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a7cc49898b..dad726a24d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1224,6 +1224,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index b6e137cf83..d0da17ea07 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -76,6 +76,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index ade32a6f44..929ef979df 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2538bd6a79..d4d7c390a1 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4918,6 +4918,692 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+explain (verbose, costs off) -- Remove SJ, define uniqueness by a constant expression
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b
+ AND j1.a = EXTRACT(DAY FROM current_timestamp(0))::int
+ AND EXTRACT(DAY FROM current_timestamp(0))::int = j2.a;
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Seq Scan on public.sj j2
+ Output: j2.a, j2.b, j2.c, j2.a, j2.b, j2.c
+ Filter: ((j2.b IS NOT NULL) AND (j2.a = (EXTRACT(day FROM CURRENT_TIMESTAMP(0)))::integer))
+(3 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = EXTRACT(DAY FROM current_timestamp(0))::int
+ AND EXTRACT(DAY FROM current_timestamp(0))::int = j2.a; -- Return one column
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 4e775af175..c734d5eca7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a27a72086e..ae49115e8a 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1753,6 +1753,311 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+explain (verbose, costs off) -- Remove SJ, define uniqueness by a constant expression
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b
+ AND j1.a = EXTRACT(DAY FROM current_timestamp(0))::int
+ AND EXTRACT(DAY FROM current_timestamp(0))::int = j2.a;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = EXTRACT(DAY FROM current_timestamp(0))::int
+ AND EXTRACT(DAY FROM current_timestamp(0))::int = j2.a; -- Return one column
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4fb746930a..faf1d8e0ba 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -362,6 +362,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ChangeVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3757,6 +3758,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.37.0
Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
On 19/5/2022 16:47, Ronan Dunklau wrote:
I'll take a look at that one.
New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.
Hello Andrey,
Thanks for the updates.
The general approach seems sensible to me, so I'm going to focus on some details.
In a very recent thread [1]/messages/by-id/830269.1656693747@sss.pgh.pa.us, Tom Lane is proposing to add infrastructure to make Var aware of their nullability by outer joins. I wonder if that would help with avoiding the need for adding is not null clauses when the column is known not null ?
If we have a precedent for adding a BitmapSet to the Var itself, maybe the whole discussion regarding keeping track of nullability can be extended to the original column nullability ?
Also, I saw it was mentioned earlier in the thread but how difficult would it be to process the transformed quals through the EquivalenceClass machinery and the qual simplification ?
For example, if the target audience of this patch is ORM, or inlined views, it wouldn't surprise me to see queries of this kind in the wild, which could be avoided altogether:
postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a = s2.a where s1.b = 2 and s2.b =3;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj s2
Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
....
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
Why don't you use foreach() and foreach_delete_current macros for iterating and removing items in the lists, both in update_ec_members and update_ec_sources ?
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
I think this is more compact and easier to follow than the previous version, but I'm not sure how common it is in postgres source code to use that kind of construction ?
Some review about the comments:
I see you keep using the terms "the keeping relation" and "the removing relation" in reference to the relation that is kept and the one that is removed.
Aside from the grammar (the kept relation or the removed relation), maybe it would make it clearer to call them something else. In other parts of the code, you used "the remaining relation / the removed relation" which makes sense.
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
This could be rephrased: ", optionally replacing it with another relid. The latter is used by the self-join removing process."
[1]: /messages/by-id/830269.1656693747@sss.pgh.pa.us
--
Ronan Dunklau
On Mon, Jul 4, 2022 at 6:52 AM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
On 19/5/2022 16:47, Ronan Dunklau wrote:
I'll take a look at that one.
New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.Hello Andrey,
Thanks for the updates.
The general approach seems sensible to me, so I'm going to focus on some
details.In a very recent thread [1], Tom Lane is proposing to add infrastructure
to make Var aware of their nullability by outer joins. I wonder if that
would help with avoiding the need for adding is not null clauses when the
column is known not null ?
If we have a precedent for adding a BitmapSet to the Var itself, maybe the
whole discussion regarding keeping track of nullability can be extended to
the original column nullability ?Also, I saw it was mentioned earlier in the thread but how difficult would
it be to process the transformed quals through the EquivalenceClass
machinery and the qual simplification ?
For example, if the target audience of this patch is ORM, or inlined
views, it wouldn't surprise me to see queries of this kind in the wild,
which could be avoided altogether:postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a =
s2.a where s1.b = 2 and s2.b =3;
QUERY PLAN
-----------------------------------------------------
Seq Scan on sj s2
Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)+ for (counter = 0; counter < list_length(*sources);) + { + ListCell *cell = list_nth_cell(*sources, counter); + RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell)); + int counter1; + .... + ec->ec_members = list_delete_cell(ec->ec_members, cell);Why don't you use foreach() and foreach_delete_current macros for
iterating and removing items in the lists, both in update_ec_members and
update_ec_sources ?+ if ((bms_is_member(k, info->syn_lefthand) ^ + bms_is_member(r, info->syn_lefthand)) || + (bms_is_member(k, info->syn_righthand) ^ + bms_is_member(r, info->syn_righthand)))I think this is more compact and easier to follow than the previous
version, but I'm not sure how common it is in postgres source code to use
that kind of construction ?Some review about the comments:
I see you keep using the terms "the keeping relation" and "the removing
relation" in reference to the relation that is kept and the one that is
removed.
Aside from the grammar (the kept relation or the removed relation), maybe
it would make it clearer to call them something else. In other parts of the
code, you used "the remaining relation / the removed relation" which makes
sense./* * Remove the target relid from the planner's data structures, having - * determined that there is no need to include it in the query. + * determined that there is no need to include it in the query. Or replace + * with another relid. + * To reusability, this routine can work in two modes: delete relid from a plan + * or replace it. It is used in replace mode in a self-join removing process.This could be rephrased: ", optionally replacing it with another relid.
The latter is used by the self-join removing process."[1]
/messages/by-id/830269.1656693747@sss.pgh.pa.us--
Ronan DunklauHi,
bq. this is more compact and easier to follow than the previous version
A code comment can be added above the expression (involving XOR) to explain
the purpose of the expression.
Cheers
On 30/6/2022 17:11, Andrey Lepikhov wrote:
On 19/5/2022 16:47, Ronan Dunklau wrote:
I'll take a look at that one.
New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.
New version rebased on new master, minor changes and tests added.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v36-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v36-0001-Remove-self-joins.patchDownload
From 8d864515da68728ddee10d455f8bdb64d34277aa Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 26 Aug 2022 15:17:53 +0300
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1045 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc.c | 10 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 735 +++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 329 +++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2215 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..2cfb62f97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5297,6 +5297,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..a9f8f89312 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3495,8 +3495,24 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
+ List *exprs;
Assert(list_length(exprlist) == list_length(oprlist));
@@ -3551,6 +3567,8 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ exprs = NIL;
+
/*
* If the index is not unique, or not immediately enforced, or if it's
* a partial index that doesn't match the query, it's useless here.
@@ -3598,6 +3616,23 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3640,7 +3675,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bbeca9a9ab..c7d30ba8ca 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,40 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo *rinfo, Index from, Index to);
+static Bitmapset *change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +106,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +307,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +320,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +332,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +371,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr *) otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +390,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr *) sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,8 +416,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
@@ -395,9 +425,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
}
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -422,6 +466,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ /*
+ * replace relid with relid of keeping relation. After that, this
+ * restriction can be attached to keeping relation.
+ */
+ change_rinfo(rinfo, relid, subst_relid);
+
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
/* Recheck that qual doesn't actually reference the target rel */
@@ -436,6 +486,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -639,9 +691,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -654,10 +711,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -971,9 +1029,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -995,10 +1065,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1015,7 +1089,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1028,10 +1102,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1077,7 +1156,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1119,5 +1199,920 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo *) node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset *
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo *rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members);)
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it is,
+ * delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass
+ * mentions of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach(cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /*
+ * XXX: Why here we haven't any differences in removing rmark or
+ * kmark?
+ */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch (rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr *) rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr *) rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr *) rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ change_varno(clause, relid, outer->relid);
+
+ c1 = (rinfo->outer_is_left ? get_leftop(clause) : get_rightop(clause));
+ iclause = (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /* Is it an unique self join? */
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX: we would detect self-join without quals like 'x==x' if
+ * we had an foreign key constraint on some of other quals and
+ * this join haven't any columns from the outer in the target
+ * list. But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of join.
+ */
+ if (!match_unique_clauses(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Attach the restrictions to removing relation to simplify relids
+ * replacing procedure. joininfo clauses of removing relation will
+ * be moved to the keeping by the remove_rel_from_query().
+ */
+ outer->baserestrictinfo = list_concat(outer->baserestrictinfo,
+ restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach(lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
+ }
+ break;
+ default:
+ Assert(false);
+ }
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 248cde4d9b..f35d1b6e54 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..0c007c23e4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1226,6 +1226,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index d11cdac7f8..e5f5439894 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -76,6 +76,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 1566f435b3..296c11b8a1 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +108,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2ed2e542a4..9c83fc76e5 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4982,6 +4982,741 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 4e775af175..c734d5eca7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 27e7e741a1..5d80af97dc 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1793,6 +1793,335 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a4a4e356e5..c60bb12451 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -362,6 +362,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ChangeVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3768,6 +3769,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.37.2
Hi,
For v36-0001-Remove-self-joins.patch :
bq removes inner join of plane table to itself
plane table -> plain table
For relation_has_unique_index_ext(), it seems when extra_clauses is NULL,
there is no need to compute `exprs`.
Cheers
Show quoted text
On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi,
For v36-0001-Remove-self-joins.patch :bq removes inner join of plane table to itself
plane table -> plain table
For relation_has_unique_index_ext(), it seems when extra_clauses is NULL,
there is no need to compute `exprs`.Cheers
For remove_self_joins_recurse():
+ if (bms_num_members(relids) > join_collapse_limit)
+ break;
The above line just comes out of the switch statement. This check should be
done again between foreach and switch.
Otherwise the above check wouldn't achieve what you want.
Cheers
On 8/29/22 04:39, Zhihong Yu wrote:
On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu <zyu@yugabyte.com
<mailto:zyu@yugabyte.com>> wrote:Hi,
For v36-0001-Remove-self-joins.patch :bq removes inner join of plane table to itself
plane table -> plain table
For relation_has_unique_index_ext(), it seems when extra_clauses
is NULL, there is no need to compute `exprs`.Cheers
Done
For remove_self_joins_recurse():
+ if (bms_num_members(relids) > join_collapse_limit) + break;The above line just comes out of the switch statement. This check should
be done again between foreach and switch.
Otherwise the above check wouldn't achieve what you want.Cheers
Thanks for highlighting the problem.
I guess, usage either of join_collapse_limit or from_collapse_limit
isn't practical here.
That we really afraid here - many senseless search cycles of self-joins.
And it may have higher limit than GUCs above. So I introduced a guc,
called "self_join_search_limit" (so far undocumented) that is an
explicit limit for a set of plain relations in FROM-list to search
self-joins.
--
Regards
Andrey Lepikhov
Postgres Professional
Attachments:
v37-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v37-0001-Remove-self-joins.patchDownload
From 6283d6e21214e34d3c1a6351fa9f6ac1aeb75ce8 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Fri, 26 Aug 2022 15:17:53 +0300
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1046 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 774 +++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 340 +++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2278 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..2cfb62f97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5297,6 +5297,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..c41e7256be 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3495,6 +3495,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3550,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3598,6 +3614,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3640,7 +3674,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bbeca9a9ab..2796feef74 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,41 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo *rinfo, Index from, Index to);
+static Bitmapset *change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +107,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +308,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +321,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +333,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +372,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr *) otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +391,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr *) sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,8 +417,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
@@ -395,9 +426,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
}
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -422,6 +467,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ /*
+ * replace relid with relid of keeping relation. After that, this
+ * restriction can be attached to keeping relation.
+ */
+ change_rinfo(rinfo, relid, subst_relid);
+
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
/* Recheck that qual doesn't actually reference the target rel */
@@ -436,6 +487,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -639,9 +692,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -654,10 +712,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -971,9 +1030,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -995,10 +1066,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1015,7 +1090,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1028,10 +1103,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1077,7 +1157,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1119,5 +1200,920 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo *) node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset *
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo *rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members);)
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it is,
+ * delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass
+ * mentions of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach(cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /*
+ * XXX: Why here we haven't any differences in removing rmark or
+ * kmark?
+ */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch (rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr *) rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr *) rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr *) rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ change_varno(clause, relid, outer->relid);
+
+ c1 = (rinfo->outer_is_left ? get_leftop(clause) : get_rightop(clause));
+ iclause = (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /* Is it an unique self join? */
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX: we would detect self-join without quals like 'x==x' if
+ * we had an foreign key constraint on some of other quals and
+ * this join haven't any columns from the outer in the target
+ * list. But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of join.
+ */
+ if (!match_unique_clauses(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Attach the restrictions to removing relation to simplify relids
+ * replacing procedure. joininfo clauses of removing relation will
+ * be moved to the keeping by the remove_rel_from_query().
+ */
+ outer->baserestrictinfo = list_concat(outer->baserestrictinfo,
+ restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach(lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ break;
+ default:
+ Assert(false);
+ }
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 248cde4d9b..f35d1b6e54 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 9fbbfb1be5..3706aadb89 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1226,6 +1226,16 @@ static struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2251,6 +2261,18 @@ static struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index d11cdac7f8..e5f5439894 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -76,6 +76,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 1566f435b3..0d91afb449 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2ed2e542a4..2f62d17dd4 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4982,6 +4982,780 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 4e775af175..c734d5eca7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 27e7e741a1..1c593c1a2f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1793,6 +1793,346 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a4a4e356e5..c60bb12451 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -362,6 +362,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ChangeVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3768,6 +3769,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.34.1
New version, rebased onto current master.
Nothing special, just rebase.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v38-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v38-0001-Remove-self-joins.patchDownload
From 03aab7a2431032166c9ea5f52fbcccaf7168abec Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1046 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 774 +++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 340 +++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2278 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d750290f13..5ce2d4d2fa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5290,6 +5290,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c31fcc917d..51f672a65c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3513,6 +3513,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3568,6 +3583,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3616,6 +3632,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3658,7 +3692,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bbeca9a9ab..2796feef74 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,41 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo *rinfo, Index from, Index to);
+static Bitmapset *change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +107,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +308,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +321,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +333,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +372,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr *) otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +391,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr *) sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,8 +417,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
@@ -395,9 +426,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
}
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -422,6 +467,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ /*
+ * replace relid with relid of keeping relation. After that, this
+ * restriction can be attached to keeping relation.
+ */
+ change_rinfo(rinfo, relid, subst_relid);
+
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
/* Recheck that qual doesn't actually reference the target rel */
@@ -436,6 +487,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -639,9 +692,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -654,10 +712,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -971,9 +1030,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -995,10 +1066,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1015,7 +1090,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1028,10 +1103,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1077,7 +1157,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1119,5 +1200,920 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo *) node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset *
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo *rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members);)
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it is,
+ * delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass
+ * mentions of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach(cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /*
+ * XXX: Why here we haven't any differences in removing rmark or
+ * kmark?
+ */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch (rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr *) rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr *) rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr *) rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ change_varno(clause, relid, outer->relid);
+
+ c1 = (rinfo->outer_is_left ? get_leftop(clause) : get_rightop(clause));
+ iclause = (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /* Is it an unique self join? */
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX: we would detect self-join without quals like 'x==x' if
+ * we had an foreign key constraint on some of other quals and
+ * this join haven't any columns from the outer in the target
+ * list. But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of join.
+ */
+ if (!match_unique_clauses(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Attach the restrictions to removing relation to simplify relids
+ * replacing procedure. joininfo clauses of removing relation will
+ * be moved to the keeping by the remove_rel_from_query().
+ */
+ outer->baserestrictinfo = list_concat(outer->baserestrictinfo,
+ restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach(lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ break;
+ default:
+ Assert(false);
+ }
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 248cde4d9b..f35d1b6e54 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934..870660c2f6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -967,6 +967,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -1992,6 +2002,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d342..eb698c3afb 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9dffdcfd1e..37b05d0120 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 08334761ae..776de859f6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4980,6 +4980,780 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t3.id = sj_t2.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 579b861d84..4c67ae1a93 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -128,10 +128,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 27e7e741a1..1c593c1a2f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1793,6 +1793,346 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 97c9bc1861..b83e509faa 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -363,6 +363,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ChangeVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3754,6 +3755,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.37.3
Hi,
On 2022-10-05 17:25:18 +0500, Andrey Lepikhov wrote:
New version, rebased onto current master.
Nothing special, just rebase.
This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 +0000
@@ -5806,7 +5806,7 @@
Nested Loop
Join Filter: (sj_t3.id = sj_t1.id)
-> Nested Loop
- Join Filter: (sj_t3.id = sj_t2.id)
+ Join Filter: (sj_t2.id = sj_t3.id)
-> Nested Loop Semi Join
-> Nested Loop
-> HashAggregate
Greetings,
Andres Freund
On 12/6/22 23:46, Andres Freund wrote:
This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffsdiff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 +0000 @@ -5806,7 +5806,7 @@ Nested Loop Join Filter: (sj_t3.id = sj_t1.id) -> Nested Loop - Join Filter: (sj_t3.id = sj_t2.id) + Join Filter: (sj_t2.id = sj_t3.id) -> Nested Loop Semi Join -> Nested Loop -> HashAggregate
This change in the test behaviour is induced by the a5fc4641
"Avoid making commutatively-duplicate clauses in EquivalenceClasses."
Nothing special, as I see. Attached patch fixes this.
--
Regards
Andrey Lepikhov
Postgres Professional
Attachments:
v39-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v39-0001-Remove-self-joins.patchDownload
From 3e546637561bf4c6d195bc7c95b1e05263e797e2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.
A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1046 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 774 +++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 340 +++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2278 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8e4145979d..2f9948d5f8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5311,6 +5311,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 914bfd90bc..8d57c68b1f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3597,6 +3613,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3639,7 +3673,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bbeca9a9ab..2796feef74 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -32,21 +33,41 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo *rinfo, Index from, Index to);
+static Bitmapset *change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -86,7 +107,7 @@ restart:
remove_rel_from_query(root, innerrelid,
bms_union(sjinfo->min_lefthand,
- sjinfo->min_righthand));
+ sjinfo->min_righthand), 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -287,7 +308,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -300,7 +321,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We must make sure that the rel is
* no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +333,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* lists, but only if they belong to the outer join identified by joinrelids.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+ int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ Assert(subst_relid == 0 || relid != subst_relid);
+
/*
* Mark the rel as "dead" to show it is no longer part of the join tree.
* (Removing it from the baserel array altogether seems too risky.)
@@ -345,8 +372,11 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr *) otherrel->lateral_vars, relid, subst_relid);
}
/*
@@ -361,10 +391,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
+
+ change_varno((Expr *) sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -385,8 +417,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 && bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
@@ -395,9 +426,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
}
else
{
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
Assert(!bms_is_empty(phinfo->ph_eval_at));
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phinfo->ph_var->phrels = change_relid(phinfo->ph_var->phrels, relid, subst_relid);
+ }
+ }
+
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
}
}
@@ -422,6 +467,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ /*
+ * replace relid with relid of keeping relation. After that, this
+ * restriction can be attached to keeping relation.
+ */
+ change_rinfo(rinfo, relid, subst_relid);
+
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
/* Recheck that qual doesn't actually reference the target rel */
@@ -436,6 +487,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
relid);
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -639,9 +692,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -654,10 +712,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -971,9 +1030,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -995,10 +1066,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1015,7 +1090,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1028,10 +1103,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1077,7 +1157,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1119,5 +1200,920 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo *) node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset *
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo *rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+ rinfo->nullable_relids = change_relid(rinfo->nullable_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members);)
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it is,
+ * delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass
+ * mentions of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach(cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /*
+ * XXX: Why here we haven't any differences in removing rmark or
+ * kmark?
+ */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch (rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr *) rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr *) rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr *) rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ change_varno(clause, relid, outer->relid);
+
+ c1 = (rinfo->outer_is_left ? get_leftop(clause) : get_rightop(clause));
+ iclause = (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /* Is it an unique self join? */
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX: we would detect self-join without quals like 'x==x' if
+ * we had an foreign key constraint on some of other quals and
+ * this join haven't any columns from the outer in the target
+ * list. But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of join.
+ */
+ if (!match_unique_clauses(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Attach the restrictions to removing relation to simplify relids
+ * replacing procedure. joininfo clauses of removing relation will
+ * be moved to the keeping by the remove_rel_from_query().
+ */
+ outer->baserestrictinfo = list_concat(outer->baserestrictinfo,
+ restrictlist);
+
+ /* Firstly, replace index of excluding relation with keeping. */
+ remove_rel_from_query(root, outer->relid, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ while (!bms_is_empty(relids))
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int i;
+
+ i = bms_first_member(relids);
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((i = bms_next_member(relids, i)) > 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[i];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, i);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *lc;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach(lc, joinlist)
+ {
+ switch (((Node *) lfirst(lc))->type)
+ {
+ case T_List:
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root,
+ (List *) lfirst(lc),
+ ToRemove);
+ break;
+ case T_RangeTblRef:
+ {
+ RangeTblRef *ref = (RangeTblRef *) lfirst(lc);
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind != RTE_RELATION ||
+ rte->relkind != RELKIND_RELATION ||
+ root->simple_rel_array[ref->rtindex] == NULL)
+ break;
+
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ break;
+ default:
+ Assert(false);
+ }
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+
+ bms_free(relids);
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 63deed27c9..bc197ec6bd 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1bf14eec66..0be3831201 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -981,6 +981,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2018,6 +2028,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d342..eb698c3afb 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9dffdcfd1e..37b05d0120 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -107,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b8d43e4c14..94a670a22d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5091,6 +5091,780 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 579b861d84..4c67ae1a93 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -128,10 +128,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partition_pruning | on
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(20 rows)
+(21 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 65aab85c35..62e67fab31 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1841,6 +1841,346 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 60c71d05fe..1a80db4ac5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -363,6 +363,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ChangeVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3757,6 +3758,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.34.1
Hi All,
I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.
I think this patch would be really beneficial for us as we heavily use views to structure out code.
Each view is responsible for providing some calculated values and they are joined in a query to retrieve the full set of information.
Not sure how the process works and how I could help (I am absolutely not capable of helping with coding I am afraid - but could sponsor a (small :) ) bounty to speed things up).
Thanks,
Michal
Show quoted text
On 16 Dec 2022, at 07:45, Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 12/6/22 23:46, Andres Freund wrote:
This doesn't pass the main regression tests due to a plan difference. https://cirrus-ci.com/task/5537518245380096 https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 +0000 +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 +0000 @@ -5806,7 +5806,7 @@ Nested Loop Join Filter: (sj_t3.id = sj_t1.id) -> Nested Loop - Join Filter: (sj_t3.id = sj_t2.id) + Join Filter: (sj_t2.id = sj_t3.id) -> Nested Loop Semi Join -> Nested Loop -> HashAggregateThis change in the test behaviour is induced by the a5fc4641
"Avoid making commutatively-duplicate clauses in EquivalenceClasses."
Nothing special, as I see. Attached patch fixes this.--
Regards
Andrey Lepikhov
Postgres Professional
<v39-0001-Remove-self-joins.patch>
On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek <michal@kleczek.org> wrote:
Hi All,
I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.
Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.
I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.
--
Gregory Stark
As Commitfest Manager
On 3/6/23 10:30, Michał Kłeczek wrote:
Hi All,
I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several
commit tests.I think this patch would be really beneficial for us as we heavily use
views to structure out code.
Each view is responsible for providing some calculated values and
they > are joined in a query to retrieve the full set of information.
Not sure how the process works and how I could help (I am absolutely
not capable of helping with coding I am afraid - but could sponsor a
(small :) ) bounty to speed things up).
Yes, I am still working on this feature. Because of significant changes
in the optimizer code which Tom & Richard had been doing last months, I
didn't touch it for a while. But now this work can be continued.
Current patch is rebased on current master. Because of the nullable_rels
logic, introduced recently, ojrelids were highly spreaded across planner
bitmapsets. So, JE logic was changed.
But now, I'm less happy with the code. It seems we need to refactor it:
1. According to reports of some performance engineers, the feature can
cause overhead ~0.5% on trivial queries without joins at all. We should
discover the patch and find the way for quick and cheap return, if the
statement contains no one join or, maybe stronger, no one self join.
2. During join elimination we replace clauses like 'x=x' with 'x IS NOT
NULL'. It is a weak point because we change clause semantic
(mergejoinable to non-mergejoinable, in this example) and could forget
consistently change some RestrictInfo fields.
3. In the previous versions we changed the remove_rel_from_query routine
trying to use it in both 'Useless outer join' and 'Self join'
elimination optimizations. Now, because of the 'ojrelid' field it looks
too complicated. Do we need to split this routine again?
--
Regards
Andrey Lepikhov
Postgres Professional
Attachments:
v40-0001-Remove-self-joins.patchtext/x-patch; charset=UTF-8; name=v40-0001-Remove-self-joins.patchDownload
From cb4340577dab0e8cf5531e9934f5734fda178490 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.
A Self Join Elimination (SJE) feature removes inner join of plain table to itself
in a query tree if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation is deduced from baserestrictinfo clause like
'x=const' on unique field(s), check what the inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1093 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 774 +++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 340 +++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2313 insertions(+), 38 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5da74b3c40..b47d11759e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5437,6 +5437,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 1436dbc2f2..1b1aa9083c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3491,6 +3491,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3546,6 +3561,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3594,6 +3610,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3636,7 +3670,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a61e35f92d..b59ee65b5f 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,23 +34,43 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
static void remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
- Relids joinrelids);
+ Relids joinrelids, int subst_relid);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst_relid);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void change_rinfo(RestrictInfo *rinfo, Index from, Index to);
+static Bitmapset *change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
/*
@@ -93,7 +114,7 @@ restart:
if (sjinfo->ojrelid != 0)
joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid);
- remove_rel_from_query(root, innerrelid, sjinfo->ojrelid, joinrelids);
+ remove_rel_from_query(root, innerrelid, sjinfo->ojrelid, joinrelids, 0);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -312,7 +333,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -325,20 +346,26 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
* Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
*
* We are not terribly thorough here. We only bother to update parts of
* the planner's data structures that will actually be consulted later.
*/
static void
remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
- Relids joinrelids)
+ Relids joinrelids, int subst_relid)
{
RelOptInfo *rel = find_base_rel(root, relid);
List *joininfos;
Index rti;
ListCell *l;
+ /* Test on impossible combination */
+ Assert((subst_relid == 0) ^ (ojrelid == 0));
+
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -362,18 +389,30 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ change_relid(otherrel->attr_needed[attroff], relid, subst_relid);
+
+ /*
+ * attr_needed could contain "0" and we don't want to delete it,
+ * due to the logic of SJR replacement: it is impossible to detect
+ * SJR for OUTER join, so here ojrelid === 0 each time.
+ */
+ if (subst_relid != 0)
+ continue;
+
otherrel->attr_needed[attroff] =
bms_del_member(otherrel->attr_needed[attroff], ojrelid);
}
+
+ /* Update lateral references. */
+ change_varno((Expr *) otherrel->lateral_vars, relid, subst_relid);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
+ root->all_baserels = change_relid(root->all_baserels, relid, subst_relid);
root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
+ root->all_query_rels = change_relid(root->all_query_rels, relid, subst_relid);
root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
/*
@@ -388,10 +427,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
{
SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(l);
- sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, relid);
- sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, relid);
- sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, relid);
- sjinfo->syn_righthand = bms_del_member(sjinfo->syn_righthand, relid);
+ sjinfo->min_lefthand = change_relid(sjinfo->min_lefthand, relid, subst_relid);
+ sjinfo->min_righthand = change_relid(sjinfo->min_righthand, relid, subst_relid);
+ sjinfo->syn_lefthand = change_relid(sjinfo->syn_lefthand, relid, subst_relid);
+ sjinfo->syn_righthand = change_relid(sjinfo->syn_righthand, relid, subst_relid);
sjinfo->min_lefthand = bms_del_member(sjinfo->min_lefthand, ojrelid);
sjinfo->min_righthand = bms_del_member(sjinfo->min_righthand, ojrelid);
sjinfo->syn_lefthand = bms_del_member(sjinfo->syn_lefthand, ojrelid);
@@ -401,6 +440,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
sjinfo->commute_above_r = bms_del_member(sjinfo->commute_above_r, ojrelid);
sjinfo->commute_below_l = bms_del_member(sjinfo->commute_below_l, ojrelid);
sjinfo->commute_below_r = bms_del_member(sjinfo->commute_below_r, ojrelid);
+
+ change_varno((Expr *) sjinfo->semi_rhs_exprs, relid, subst_relid);
}
/*
@@ -421,8 +462,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (subst_relid == 0 &&
+ bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
@@ -433,19 +474,36 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
+ phinfo->ph_eval_at = change_relid(phinfo->ph_eval_at, relid, subst_relid);
phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = change_relid(phinfo->ph_needed, relid, subst_relid);
+
+ if (subst_relid == 0)
+ /* See comment above on otherrel->attr_needed arrangement */
+ phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
+ Assert(subst_relid != 0 || !bms_is_member(relid, phinfo->ph_lateral));
+ phinfo->ph_lateral = change_relid(phinfo->ph_lateral, relid, subst_relid);
+ phv->phrels = change_relid(phv->phrels, relid, subst_relid);
phv->phrels = bms_del_member(phv->phrels, ojrelid);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ if (subst_relid != 0)
+ {
+ foreach(l, rel->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
+
+ change_rinfo(rinfo, relid, subst_relid);
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ }
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -466,6 +524,7 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ change_rinfo(rinfo, relid, subst_relid);
if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
{
@@ -475,10 +534,12 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
* that include those. We already checked above that any such PHV
* is safe, so we can just drop those references.
*/
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid, subst_relid);
/* Now throw it back into the joininfo lists */
distribute_restrictinfo_to_rels(root, rinfo);
}
+ else if (subst_relid != 0)
+ distribute_restrictinfo_to_rels(root, rinfo);
}
/*
@@ -494,7 +555,13 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
root->simple_rel_array[relid] = NULL;
/* And nuke the RelOptInfo, just in case there's another access path */
- pfree(rel);
+ if (subst_relid == 0)
+ /*
+ * Trade-off: can't free this struct in the case of SJR, but don't want
+ * totally change the core logic.
+ * XXX: a redesign of SJR code has been required.
+ */
+ pfree(rel);
}
/*
@@ -506,18 +573,18 @@ remove_rel_from_query(PlannerInfo *root, int relid, int ojrelid,
* we have to also clean up the sub-clauses.
*/
static void
-remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
+remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid, int subst_relid)
{
/*
* The clause_relids probably aren't shared with anything else, but let's
* copy them just to be sure.
*/
rinfo->clause_relids = bms_copy(rinfo->clause_relids);
- rinfo->clause_relids = bms_del_member(rinfo->clause_relids, relid);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, relid, subst_relid);
rinfo->clause_relids = bms_del_member(rinfo->clause_relids, ojrelid);
/* Likewise for required_relids */
rinfo->required_relids = bms_copy(rinfo->required_relids);
- rinfo->required_relids = bms_del_member(rinfo->required_relids, relid);
+ rinfo->required_relids = change_relid(rinfo->required_relids, relid, subst_relid);
rinfo->required_relids = bms_del_member(rinfo->required_relids, ojrelid);
/* If it's an OR, recurse to clean up sub-clauses */
@@ -540,14 +607,14 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
{
RestrictInfo *rinfo2 = lfirst_node(RestrictInfo, lc2);
- remove_rel_from_restrictinfo(rinfo2, relid, ojrelid);
+ remove_rel_from_restrictinfo(rinfo2, relid, ojrelid, subst_relid);
}
}
else
{
RestrictInfo *rinfo2 = castNode(RestrictInfo, orarg);
- remove_rel_from_restrictinfo(rinfo2, relid, ojrelid);
+ remove_rel_from_restrictinfo(rinfo2, relid, ojrelid, subst_relid);
}
}
}
@@ -747,9 +814,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -762,10 +834,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1079,9 +1152,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1103,10 +1188,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1123,7 +1212,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1136,10 +1225,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1185,7 +1279,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1227,5 +1322,929 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ChangeVarnoContext
+{
+ Index oldRelid;
+ Index newRelid;
+} ChangeVarnoContext;
+
+
+static bool
+change_varno_walker(Node *node, ChangeVarnoContext *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == context->oldRelid)
+ {
+ var->varno = context->newRelid;
+ var->varnosyn = context->newRelid;
+ var->location = -1;
+ }
+ else if (var->varno == context->newRelid)
+ var->location = -1;
+
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ change_rinfo((RestrictInfo *) node, context->oldRelid, context->newRelid);
+ return false;
+ }
+ return expression_tree_walker(node, change_varno_walker, context);
+}
+
+/*
+ * For all Vars in the expression that have varno = oldRelid, set
+ * varno = newRelid.
+ */
+static void
+change_varno(Expr *expr, Index oldRelid, Index newRelid)
+{
+ ChangeVarnoContext context;
+
+ if (newRelid == 0)
+ return;
+
+ context.oldRelid = oldRelid;
+ context.newRelid = newRelid;
+ change_varno_walker((Node *) expr, &context);
+}
+
+/*
+ * Substitute newId for oldId in relids.
+ */
+static Bitmapset *
+change_relid(Relids relids, Index oldId, Index newId)
+{
+ if (newId == 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+static void
+change_rinfo(RestrictInfo *rinfo, Index from, Index to)
+{
+ bool is_req_equal;
+
+ if (to == 0)
+ return;
+
+ is_req_equal = (rinfo->required_relids == rinfo->clause_relids);
+
+ change_varno(rinfo->clause, from, to);
+ change_varno(rinfo->orclause, from, to);
+ rinfo->clause_relids = change_relid(rinfo->clause_relids, from, to);
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = change_relid(rinfo->required_relids, from, to);
+ rinfo->left_relids = change_relid(rinfo->left_relids, from, to);
+ rinfo->right_relids = change_relid(rinfo->right_relids, from, to);
+ rinfo->outer_relids = change_relid(rinfo->outer_relids, from, to);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ */
+static void
+update_ec_members(EquivalenceClass *ec, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(ec->ec_members);)
+ {
+ ListCell *cell = list_nth_cell(ec->ec_members, counter);
+ EquivalenceMember *em = lfirst(cell);
+ int counter1;
+
+ if (!bms_is_member(toRemove, em->em_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ em->em_relids = change_relid(em->em_relids, toRemove, toKeep);
+ /* We only process inner joins */
+ change_varno(em->em_expr, toRemove, toKeep);
+
+ /*
+ * After we switched the equivalence member to the remaining relation,
+ * check that it is not the same as the existing member, and if it is,
+ * delete it.
+ */
+ for (counter1 = 0; counter1 < list_length(ec->ec_members); counter1++)
+ {
+ EquivalenceMember *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(EquivalenceMember, list_nth(ec->ec_members, counter1));
+
+ if (equal(other->em_expr, em->em_expr))
+ break;
+ }
+
+ if (counter1 < list_length(ec->ec_members))
+ ec->ec_members = list_delete_cell(ec->ec_members, cell);
+ else
+ counter++;
+ }
+}
+
+/*
+ * Update EC sources to point to the remaining relation instead of the
+ * removed one.
+ */
+static void
+update_ec_sources(List **sources, Index toRemove, Index toKeep)
+{
+ int counter;
+
+ for (counter = 0; counter < list_length(*sources);)
+ {
+ ListCell *cell = list_nth_cell(*sources, counter);
+ RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+ int counter1;
+
+ if (!bms_is_member(toRemove, rinfo->required_relids))
+ {
+ counter++;
+ continue;
+ }
+
+ change_varno(rinfo->clause, toRemove, toKeep);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ for (counter1 = 0; counter1 < list_length(*sources); counter1++)
+ {
+ RestrictInfo *other;
+
+ if (counter1 == counter)
+ continue;
+
+ other = castNode(RestrictInfo, list_nth(*sources, counter1));
+ if (equal(rinfo->clause, other->clause))
+ break;
+ }
+
+ if (counter1 < list_length(*sources))
+ *sources = list_delete_cell(*sources, cell);
+ else
+ {
+ counter++;
+
+ /* We will keep this RestrictInfo, correct its relids. */
+ change_rinfo(rinfo, toRemove, toKeep);
+ }
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * The joinclauses list is destructively changed.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove)
+{
+ ListCell *cell;
+ int i;
+ List *target = NIL;
+
+ /*
+ * Include all eclass mentions of removed relation into the eclass
+ * mentions of kept relation.
+ */
+ toKeep->eclass_indexes = bms_add_members(toRemove->eclass_indexes,
+ toKeep->eclass_indexes);
+
+ /*
+ * Now, baserestrictinfo replenished with restrictions from removing
+ * relation. It is needed to remove duplicates and replace degenerated
+ * clauses with a NullTest.
+ */
+ foreach(cell, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, cell);
+ ListCell *otherCell = NULL;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->clause_relids));
+
+ /*
+ * If this clause is a mergejoinable equality clause that compares a
+ * variable to itself, i.e., has the form of "X=X", replace it with
+ * null test.
+ */
+ if (rinfo->mergeopfamilies && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *nullTest = makeNode(NullTest);
+
+ nullTest->arg = leftOp;
+ nullTest->nulltesttype = IS_NOT_NULL;
+ nullTest->argisrow = false;
+ nullTest->location = -1;
+ rinfo->clause = (Expr *) nullTest;
+ }
+ }
+
+ /* Search for duplicates. */
+ foreach(otherCell, target)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, otherCell);
+
+ if (other == rinfo ||
+ (rinfo->parent_ec != NULL
+ && other->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, other->clause))
+ {
+ break;
+ }
+ }
+
+ if (otherCell != NULL)
+ /* Duplicate found */
+ continue;
+
+ target = lappend(target, rinfo);
+ }
+
+ list_free(toKeep->baserestrictinfo);
+ toKeep->baserestrictinfo = target;
+
+ /*
+ * Update the equivalence classes that reference the removed relations.
+ */
+ foreach(cell, root->eq_classes)
+ {
+ EquivalenceClass *ec = lfirst(cell);
+
+ if (!bms_is_member(toRemove->relid, ec->ec_relids))
+ {
+ /*
+ * This EC doesn't reference the removed relation, nothing to be
+ * done for it.
+ */
+ continue;
+ }
+
+ /*
+ * Update the EC members to reference the remaining relation instead
+ * of the removed one.
+ */
+ update_ec_members(ec, toRemove->relid, toKeep->relid);
+ ec->ec_relids = change_relid(ec->ec_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * We will now update source and derived clauses of the EC.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+ update_ec_sources(&ec->ec_sources, toRemove->relid, toKeep->relid);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+ Assert(toRemove->reltarget->sortgrouprefs == 0);
+
+ foreach(cell, toRemove->reltarget->exprs)
+ {
+ Expr *node = lfirst(cell);
+
+ change_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ /* We didn't change toRemove during the remove_rel_from_query execution */
+ toRemove->attr_needed[attno] = change_relid(
+ toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ *
+ * Also make sure that the scratch->row_marks cache is up to date, because
+ * we are going to use it for further join removals.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ /*
+ * XXX: Why here we haven't any differences in removing rmark or
+ * kmark?
+ */
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Change varno in some special cases with non-trivial RangeTblEntry
+ */
+ foreach(cell, root->parse->rtable)
+ {
+ RangeTblEntry *rte = lfirst(cell);
+
+ switch (rte->rtekind)
+ {
+ case RTE_FUNCTION:
+ change_varno((Expr *) rte->functions, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_TABLEFUNC:
+ change_varno((Expr *) rte->tablefunc, toRemove->relid, toKeep->relid);
+ break;
+ case RTE_VALUES:
+ change_varno((Expr *) rte->values_lists, toRemove->relid, toKeep->relid);
+ break;
+ default:
+ /* no op */
+ break;
+ }
+ }
+
+ /*
+ * Replace varno in root targetlist and HAVING clause.
+ */
+ change_varno((Expr *) root->processed_tlist, toRemove->relid, toKeep->relid);
+ change_varno((Expr *) root->parse->havingQual, toRemove->relid, toKeep->relid);
+
+ /*
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Expr *leftexpr;
+ Expr *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) ? */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = (Expr *) copyObject(get_leftop(rinfo->clause));
+ rightexpr = (Expr *) copyObject(get_rightop(rinfo->clause));
+
+ /* Can't match of the exprs are not of the same type */
+ if (leftexpr->type != rightexpr->type)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ change_varno(rightexpr,
+ bms_next_member(rinfo->right_relids, -1),
+ bms_next_member(rinfo->left_relids, -1));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ change_varno(clause, relid, outer->relid);
+
+ c1 = (rinfo->outer_is_left ? get_leftop(clause) : get_rightop(clause));
+ iclause = (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids joinrelids = NULL;
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ if (bms_num_members(relids) < 2)
+ return NULL;
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to optimize two relations if they belong to
+ * different rules of order restriction. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+
+ if (!jinfo_check)
+ continue;
+
+ /* Reuse joinrelids bitset to avoid reallocation. */
+ joinrelids = bms_del_members(joinrelids, joinrelids);
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner, 0);
+
+ /*
+ * Process restrictlist to seperate out the self join quals from
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals);
+
+ /* Is it an unique self join? */
+
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX: we would detect self-join without quals like 'x==x' if
+ * we had an foreign key constraint on some of other quals and
+ * this join haven't any columns from the outer in the target
+ * list. But it is still complex task.
+ */
+ continue;
+ }
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of join.
+ */
+ if (!match_unique_clauses(outer, uclauses, inner->relid))
+ continue;
+
+ /* See for row marks. */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+
+ /*
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+
+ if (lc)
+ continue;
+
+ /*
+ * We can remove either relation, so remove the outer one, to
+ * simplify this loop.
+ */
+
+ /*
+ * Attach the restrictions to removing relation to simplify relids
+ * replacing procedure. joininfo clauses of removing relation will
+ * be moved to the keeping one by the remove_rel_from_query().
+ */
+ outer->baserestrictinfo = list_concat(outer->baserestrictinfo,
+ restrictlist);
+
+ /* Firstly, replace index of excluding relation with the keeping one. */
+ remove_rel_from_query(root, outer->relid, 0, joinrelids, inner->relid);
+
+ /* Secondly, fix restrictions of keeping relation */
+ remove_self_join_rel(root, imark, omark, inner, outer);
+
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+static Relids
+remove_self_joins_one_level(PlannerInfo *root, Relids relids, Relids ToRemove)
+{
+ int i = -1;
+
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int j = i;
+
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((j = bms_next_member(relids, j)) >= 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[j];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, j);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+
+ Assert(bms_is_empty(relids));
+ return ToRemove;
+}
+
+/*
+ * For each level of joinlist form a set of base relations and launch the
+ * routine of the self-join removal optimization. Recurse into sub-joinlists to
+ * handle deeper levels.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+
+ /* Collect the ids of base relations at one level of the join tree. */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ ToRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ /*
+ * remove_self_joins_one_level cleans the relids bitmapset by itself, so we
+ * should free it only if no any optimization has been made.
+ */
+ if (bms_num_members(relids) >= 2)
+ ToRemove = remove_self_joins_one_level(root, relids, ToRemove);
+ else
+ bms_free(relids);
+
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * We search for joins where the same relation is joined to itself on all
+ * columns of some unique index. If this condition holds, then, for
+ * each outer row, only one inner row matches, and it is the same row
+ * of the same relation. This allows us to remove the join and replace
+ * it with a scan that combines WHERE clauses from both sides. The join
+ * clauses themselves assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set. We check that both relation are made unique by the same
+ * unique index with the same clauses.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || list_length(joinlist) <= 1)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 7afd434c60..748e2a3ec8 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 68aecad66f..dd019dfd78 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1022,6 +1022,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2069,6 +2079,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 50bc3b503a..37f577e9b0 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 5fc900737d..d26c0d3ea5 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +106,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 1fd75c8f58..2678f75bd9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5888,6 +5888,780 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9..3062686616 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 84547c7dff..513fd791ca 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2182,6 +2182,346 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two coies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 260854747b..fe69e9dbe4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -366,6 +366,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ChangeVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3822,6 +3823,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.34.1
On 4/4/2023 02:30, Gregory Stark (as CFM) wrote:
On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek <michal@kleczek.org> wrote:
Hi All,
I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.
Hi,
Version 41 is heavily remade of the feature:
1. In previous versions, I tried to reuse remove_rel_from_query() for
both left and self-join removal. But for now, I realized that it is a
bit different procedures which treat different operations. In this
patch, only common stages of the PlannerInfo fixing process are united
in one function.
2. Transferring clauses from the removing table to keeping one is more
transparent now and contains comments.
3. Equivalence classes update procedure was changed according to David's
commit 3373c71. As I see, Tom has added remove_rel_from_eclass since the
last v.40 version, and it looks pretty similar to the update_eclass
routine in this patch.
It passes regression tests, but some questions are still open:
1. Should we look for duplicated or redundant clauses (the same for
eclasses) during the clause transfer procedure? On the one side, we
reduce the length of restrict lists that can impact planning or
executing time. Additionally, we improve the accuracy of cardinality
estimation. On the other side, it is one more place that can make
planning time much longer in specific cases. It would have been better
to avoid calling the equal() function here, but it's the only way to
detect duplicated inequality expressions.
2. Could we reuse ChangeVarNodes instead of sje_walker(), merge
remove_rel_from_restrictinfo with replace_varno?
3. Also, I still don't finish with the split_selfjoin_quals: some
improvements could be made.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v41-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v41-0001-Remove-self-joins.patchDownload
From 4a342b9789f5be209318c13fb7ec336fcbd2aee5 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.
A Self Join Elimination (SJE) feature removes inner join of plain table to itself
in a query tree if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation is deduced from baserestrictinfo clause like
'x=const' on unique field(s), check what the inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.
Some regression tests changed due to self-join removal logic.
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1077 ++++++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 798 +++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 351 +++++++
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2319 insertions(+), 51 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2..68215e1093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5419,6 +5419,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0065c8992b..57bdc6811f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3491,6 +3491,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3546,6 +3561,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3597,6 +3613,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3639,7 +3673,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a..4f91c3a8ed 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,8 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +35,27 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +63,17 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
/*
@@ -89,7 +111,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +330,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -328,21 +350,15 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* the planner's data structures that will actually be consulted later.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +382,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +411,19 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
}
/*
@@ -438,18 +457,37 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query_common(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +882,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +902,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1220,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1256,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1280,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1293,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1347,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,5 +1390,916 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ReplaceVarnoContext
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Replace each occurence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from , int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclass(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid, .to = toKeep->relid};
+
+ /*
+ * Replace index of removing table with the keeping one. The technique of
+ * removing/distributing restrictinfo is used here to attach just appeared
+ * (for keeping relation) join clauses and avoid of adding duplicates of
+ * those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the removing
+ * table just to simplify replacement procedure: all of them weren't
+ * connected to any keeping relations and need to be added to some rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side we reduce length of restrict
+ * lists that can impact planning or executing time. Additionally, we
+ * improve accuracy of cardinality estimation. On the other side, it is one
+ * more place which can make planning time much longer in specific cases.
+ * It would have been better to avoid calling the equal() function here,
+ * but it's only way to detect duplicated inequality expressions.
+ */
+
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing table, with the keeping
+ * one: varno of removin table should be replaced in members and sources
+ * lists. Also, remove duplicated elements if this replacement procedure
+ * created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclass(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(
+ toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* At last, replace links in the planner info */
+ remove_rel_from_query_common(root, toRemove, toKeep->relid, NULL, NULL);
+ /* replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+ /* ... and remove the rel from the baserel array */
+ root->simple_rel_array[toRemove->relid] = NULL;
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_num_members(rinfo->left_relids) != 1 ||
+ bms_num_members(rinfo->right_relids) != 1)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite expensive operation, narrowing the use case. For example, when
+ * we have cast of the same var to different (but compatible) types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses, Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ c1 = (rinfo->outer_is_left ? get_leftop(clause) : get_rightop(clause));
+ iclause = (rinfo->outer_is_left ?
+ get_rightop(clause) : get_leftop(clause));
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = (bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause));
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence.
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, 0);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ ToRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+ if (bms_num_members(relids) >= 2)
+ {
+ int i = -1;
+
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int j = i;
+
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((j = bms_next_member(relids, j)) >= 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[j];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, j);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+
+ bms_free(group);
+ }
+
+ Assert(bms_is_empty(relids));
+ }
+
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause for
+ * each tuple from one side of the join is proven to match the same physical row
+ * (or nothing) on the other side, that self-join can be eliminated from the
+ * query.
+ * Suitable join clauses assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+
+ if (unlikely(ToRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 7afd434c60..748e2a3ec8 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 71e27f8eb0..4c1d4003f1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1017,6 +1017,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2064,6 +2074,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 50bc3b503a..37f577e9b0 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 5fc900737d..d26c0d3ea5 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +106,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b8638f286..b1f43f6ff8 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6134,6 +6134,804 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9..3062686616 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04d..a170a9bbd0 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,357 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 260854747b..edbe6787d0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -366,6 +366,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3822,6 +3823,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.41.0
Hi,
During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v41-v42-delta.txttext/plain; charset=UTF-8; name=v41-v42-delta.txtDownload
From 8f5a432f6fbbcad1fd2937f33af09e9328690b6b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 4 Jul 2023 16:07:50 +0700
Subject: [PATCH] Add lost arrangements of relids and varnos. Add the test to
check it. Add one more cleaning procedure on JoinDomain relids which was
introduced recently with commit 3bef56e. Fix the corner case when we haven't
removed SJ if the selfjoinquals list was empty.
---
src/backend/optimizer/plan/analyzejoins.c | 15 ++++++++++-
src/test/regress/expected/join.out | 26 ++++++++++++++++---
src/test/regress/expected/updatable_views.out | 17 +++++-------
src/test/regress/sql/join.sql | 9 +++++++
4 files changed, 53 insertions(+), 14 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a93e4ce05c..15234b7a3b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -424,6 +424,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -465,6 +467,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
/* ph_needed might or might not become empty */
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
@@ -1545,6 +1549,7 @@ update_eclass(EquivalenceClass *ec, int from, int to)
}
em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
/* We only process inner joins */
replace_varno((Node *) em->em_expr, from, to);
@@ -2101,7 +2106,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
- outer, 0);
+ outer, NULL);
/*
* Process restrictlist to seperate the self join quals out of
@@ -2111,6 +2116,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
split_selfjoin_quals(root, restrictlist, &selfjoinquals,
&otherjoinquals, inner->relid, outer->relid);
+ /*
+ * To enable SJE for the only degenerate case without any self join
+ * clauses at all, add baserestrictinfo to this list.
+ * Degenerate case works only if both sides have the same clause. So
+ * doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
/*
* Determine if the inner table can duplicate outer rows. We must
* bypass the unique rel cache here since we're possibly using a
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b1f43f6ff8..027c356bcc 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,11 +5807,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -6327,6 +6329,24 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
--
-- SJR corner case: uniqueness of an inner is [partially] derived from
-- baserestrictinfo clauses.
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281..a73c1f90c4 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index a170a9bbd0..1a64363837 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2394,6 +2394,15 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
--
-- SJR corner case: uniqueness of an inner is [partially] derived from
-- baserestrictinfo clauses.
--
2.41.0
v42-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v42-0001-Remove-self-joins.patchDownload
From f25cb2f1d196a8a1ff61463c0aa5dc673f7251eb Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.
Self Join Elimination (SJE) feature removes an inner join of a plain table to
itself in the query tree if is proved that the join can be replaced with
a scan without impact to the query result.
Self join and inner relation are replaced with the outer in query, equivalence
classes and planner info structures. Also, inner restrictlist moves to the
outer one with removing duplicated clauses. Thus, this optimization reduces
length of range table list (especially it make sense for partitioned relations),
reduces number of restriction clauses === selectivity estimations and
potentially can improve total planner prediction for the query.
The SJE proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row if:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Add to the list above baseretrictinfo of inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip this
pair of joining tables.
4. Check uniqueness, proved by baserestrictinfo clauses. To prove possibility
of the self-join elimination inner and outer clauses must have exact match.
Relation replacement procedure is not trivial and it is merged with the one,
used to remove useless left joins.
Tests, covering this feature, added to the join.sql.
Some regression tests changed due to self-join removal logic.
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1092 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 824 ++++++++++++-
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 360 ++++++
src/tools/pgindent/typedefs.list | 2 +
14 files changed, 2373 insertions(+), 64 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2..68215e1093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5419,6 +5419,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..508285d1ef 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3616,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3676,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a..15234b7a3b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,8 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +35,27 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +63,17 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
/*
@@ -89,7 +111,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +330,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -328,21 +350,15 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* the planner's data structures that will actually be consulted later.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +382,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +411,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +459,39 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query_common(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +886,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +906,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1224,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1260,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1284,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1297,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1351,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,5 +1394,927 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ReplaceVarnoContext
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Replace each occurence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from , int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclass(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid, .to = toKeep->relid};
+
+ /*
+ * Replace index of removing table with the keeping one. The technique of
+ * removing/distributing restrictinfo is used here to attach just appeared
+ * (for keeping relation) join clauses and avoid of adding duplicates of
+ * those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the removing
+ * table just to simplify replacement procedure: all of them weren't
+ * connected to any keeping relations and need to be added to some rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side we reduce length of restrict
+ * lists that can impact planning or executing time. Additionally, we
+ * improve accuracy of cardinality estimation. On the other side, it is one
+ * more place which can make planning time much longer in specific cases.
+ * It would have been better to avoid calling the equal() function here,
+ * but it's only way to detect duplicated inequality expressions.
+ */
+
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing table, with the keeping
+ * one: varno of removin table should be replaced in members and sources
+ * lists. Also, remove duplicated elements if this replacement procedure
+ * created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclass(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(
+ toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* At last, replace links in the planner info */
+ remove_rel_from_query_common(root, toRemove, toKeep->relid, NULL, NULL);
+ /* replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+ /* ... and remove the rel from the baserel array */
+ root->simple_rel_array[toRemove->relid] = NULL;
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite expensive operation, narrowing the use case. For example, when
+ * we have cast of the same var to different (but compatible) types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence.
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self join
+ * clauses at all, add baserestrictinfo to this list.
+ * Degenerate case works only if both sides have the same clause. So
+ * doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ ToRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+ if (bms_num_members(relids) >= 2)
+ {
+ int i = -1;
+
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int j = i;
+
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((j = bms_next_member(relids, j)) >= 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[j];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, j);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+
+ bms_free(group);
+ }
+
+ Assert(bms_is_empty(relids));
+ }
+
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause for
+ * each tuple from one side of the join is proven to match the same physical row
+ * (or nothing) on the other side, that self-join can be eliminated from the
+ * query.
+ * Suitable join clauses assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+
+ if (unlikely(ToRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 7afd434c60..748e2a3ec8 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -228,6 +228,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f8ef87d26d..d5bc41824b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1017,6 +1017,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2064,6 +2074,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 50bc3b503a..37f577e9b0 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 5fc900737d..d26c0d3ea5 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +106,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b8638f286..027c356bcc 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,11 +5807,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -6134,6 +6136,822 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9..3062686616 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- Test that the pg_timezone_names and pg_timezone_abbrevs views are
-- more-or-less working. We can't test their contents in any great detail
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281..a73c1f90c4 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04d..1a64363837 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,366 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e941fb6c82..dcae2f598c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -366,6 +366,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3824,6 +3825,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.41.0
On 5/7/2023 21:28, Andrey Lepikhov wrote:
Hi,
During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.
Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
v43-0001-Remove-self-joins.patchtext/plain; charset=UTF-8; name=v43-0001-Remove-self-joins.patchDownload
From 70bb5cf3d11b2797f1a9c7b04740435135229d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove self-joins.
Self Join Elimination (SJE) feature removes an inner join of a plain table to
itself in the query tree if is proved that the join can be replaced with
a scan without impact to the query result.
Self join and inner relation are replaced with the outer in query, equivalence
classes and planner info structures. Also, inner restrictlist moves to the
outer one with removing duplicated clauses. Thus, this optimization reduces
length of range table list (especially it make sense for partitioned relations),
reduces number of restriction clauses === selectivity estimations and
potentially can improve total planner prediction for the query.
The SJE proof based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Add to the list above baseretrictinfo of inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip this
pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove possibility
of the self-join elimination inner and outer clauses must have exact match.
Relation replacement procedure is not trivial and it is partly combined with the one,
used to remove useless left joins.
Tests, covering this feature, added to the join.sql.
Some regression tests changed due to self-join removal logic.
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 38 +
src/backend/optimizer/plan/analyzejoins.c | 1094 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 824 ++++++++++++-
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 360 ++++++
src/tools/pgindent/typedefs.list | 2 +
14 files changed, 2375 insertions(+), 64 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..43c07b0d3b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5299,6 +5299,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..508285d1ef 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3616,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3676,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a..e09951f32b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,8 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +35,27 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * extra_clauses contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for SJ checking procedure: SJ can
+ * be removed if outer relation contains strictly the same set of clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +63,17 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
/*
@@ -89,7 +111,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +330,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -328,21 +350,15 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* the planner's data structures that will actually be consulted later.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +382,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +411,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +459,39 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query_common(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +886,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains right sides of baserestrictinfo clauses looks like
+ * x = const if distinctness is derived from such clauses, not joininfo clause.
+ * Pass NULL for the param value, if it is not interested.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +906,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1224,21 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+bool
+innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, Relids outerrelids,
+ RelOptInfo *innerrel, JoinType jointype,
+ List *restrictlist, bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1260,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1284,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1297,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1351,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,5 +1394,929 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+typedef struct ReplaceVarnoContext
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Replace each occurence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from , int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC, because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclass(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid, .to = toKeep->relid};
+
+ /*
+ * Replace index of removing table with the keeping one. The technique of
+ * removing/distributing restrictinfo is used here to attach just appeared
+ * (for keeping relation) join clauses and avoid of adding duplicates of
+ * those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the removing
+ * table just to simplify replacement procedure: all of them weren't
+ * connected to any keeping relations and need to be added to some rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side we reduce length of restrict
+ * lists that can impact planning or executing time. Additionally, we
+ * improve accuracy of cardinality estimation. On the other side, it is one
+ * more place which can make planning time much longer in specific cases.
+ * It would have been better to avoid calling the equal() function here,
+ * but it's only way to detect duplicated inequality expressions.
+ */
+
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing table, with the keeping
+ * one: varno of removin table should be replaced in members and sources
+ * lists. Also, remove duplicated elements if this replacement procedure
+ * created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclass(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(
+ toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* At last, replace links in the planner info */
+ remove_rel_from_query_common(root, toRemove, toKeep->relid, NULL, NULL);
+ /* replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+ /* ... and remove the rel from the baserel array */
+ root->simple_rel_array[toRemove->relid] = NULL;
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' building two lists, one with a list of quals
+ * where the columns/exprs on either side of the join match and another
+ * list containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite expensive operation, narrowing the use case. For example, when
+ * we have cast of the same var to different (but compatible) types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness [partly] derived from a baserestrictinfo
+ * clause. In this case we have a chance to return only one row (if such
+ * clauses on both sides of SJ is equal) or nothing - if they are different.
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't
+ * be able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence.
+ * We can't remove the join if the relations have row marks of
+ * different strength (e.g. one is locked FOR UPDATE and another
+ * just has ROW_MARK_REFERENCE for EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of
+ * the other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self join
+ * clauses at all, add baserestrictinfo to this list.
+ * Degenerate case works only if both sides have the same clause. So
+ * doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' = true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner, to
+ * simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids ToRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ ToRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ ToRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+
+ /*
+ * Limit the number of joins we process to control the
+ * quadratic behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and launch
+ * the routine that detects self-joins in this group and removes excessive
+ * range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids list.
+ * So each next iteration of the cycle will involve less and less value of
+ * relids.
+ */
+ if (bms_num_members(relids) >= 2)
+ {
+ int i = -1;
+
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ Relids group = NULL;
+ Oid groupOid;
+ int j = i;
+
+ groupOid = root->simple_rte_array[i]->relid;
+ Assert(OidIsValid(groupOid));
+ group = bms_add_member(group, i);
+
+ /* Create group of relation indexes with the same oid. */
+ while ((j = bms_next_member(relids, j)) >= 0)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[j];
+
+ Assert(OidIsValid(rte->relid));
+
+ if (rte->relid == groupOid)
+ group = bms_add_member(group, j);
+ }
+
+ relids = bms_del_members(relids, group);
+ ToRemove = bms_add_members(ToRemove,
+ remove_self_joins_one_group(root, group));
+
+ bms_free(group);
+ }
+
+ Assert(bms_is_empty(relids));
+ }
+
+ return ToRemove;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause for
+ * each tuple from one side of the join is proven to match the same physical row
+ * (or nothing) on the other side, that self-join can be eliminated from the
+ * query.
+ * Suitable join clauses assume the form of X = X and can be replaced with
+ * NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner, but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as
+ * dead, and rewrite all references to it to point to the remaining
+ * relation. This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids ToRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal)
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ ToRemove = remove_self_joins_recurse(root, joinlist, ToRemove);
+
+ if (unlikely(ToRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(ToRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fcc0eacd25..be5ef79af3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index bdb26e2b77..22a498bc21 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1026,6 +1026,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2067,6 +2077,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 50bc3b503a..37f577e9b0 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 31c188176b..5f45ffda57 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +106,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..de71441052 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b8638f286..027c356bcc 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,11 +5807,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -6134,6 +6136,822 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index aae5d51e1c..271313ebf8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281..a73c1f90c4 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04d..1a64363837 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,366 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f2af84d7ca..fbad9c9a2a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,6 +367,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3830,6 +3831,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.42.0
Hi!
I think this is a neat optimization.
On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 5/7/2023 21:28, Andrey Lepikhov wrote:
During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.
I went through the thread and I think the patch gets better shape. A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?
2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?
Links
1. /messages/by-id/2375492.jE0xQCEvom@aivenronan
2. /messages/by-id/830269.1656693747@sss.pgh.pa.us
------
Regards,
Alexander Korotkov
On 4/10/2023 07:12, Alexander Korotkov wrote:
Hi!
Thanks for the review!
I think this is a neat optimization.
On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 5/7/2023 21:28, Andrey Lepikhov wrote:
During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.I went through the thread and I think the patch gets better shape. A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?
Shortly speaking, it was done according to the 'Paranoid' strategy.
The main reason for copying before deletion was the case with the rinfo
required_relids and clause_relids. They both point to the same Bitmapset
in some cases. And we feared such things for other fields.
Right now, it may be redundant because we resolved the issue mentioned
above in replace_varno_walker.
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.
2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?
Thanks for the links, I will look into that case.
Links
1. /messages/by-id/2375492.jE0xQCEvom@aivenronan
2. /messages/by-id/830269.1656693747@sss.pgh.pa.us
--
regards,
Andrey Lepikhov
Postgres Professional
Hi!
On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:
On 4/10/2023 07:12, Alexander Korotkov wrote:
On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 5/7/2023 21:28, Andrey Lepikhov wrote:
During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the
future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the
join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.I went through the thread and I think the patch gets better shape. A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?Shortly speaking, it was done according to the 'Paranoid' strategy.
The main reason for copying before deletion was the case with the rinfo
required_relids and clause_relids. They both point to the same Bitmapset
in some cases. And we feared such things for other fields.
Right now, it may be redundant because we resolved the issue mentioned
above in replace_varno_walker.
OK, but my point is still that you should be paranoid in all the cases or
none of the cases. Right now (newId < 0) branch doesn't copy source
relids, but bms_is_member(oldId, relids) does copy. Also, I think whether
we copy or not should be reflected in the function comment.
/*
* Substitute newId by oldId in relids.
*/
static Bitmapset *
replace_relid(Relids relids, int oldId, int newId)
{
if (oldId < 0)
return relids;
if (newId < 0)
/* Delete relid without substitution. */
return bms_del_member(relids, oldId);
if (bms_is_member(oldId, relids))
return bms_add_member(bms_del_member(bms_copy(relids), oldId),
newId);
return relids;
}
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.
2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
OK, thank you.
------
Regards,
Alexander Korotkov
On 4/10/2023 14:34, Alexander Korotkov wrote:
Hi!
On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:On 4/10/2023 07:12, Alexander Korotkov wrote:
On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:On 5/7/2023 21:28, Andrey Lepikhov wrote:
During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in thefuture.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for thejoin.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.I went through the thread and I think the patch gets better shape. A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?Shortly speaking, it was done according to the 'Paranoid' strategy.
The main reason for copying before deletion was the case with the rinfo
required_relids and clause_relids. They both point to the same Bitmapset
in some cases. And we feared such things for other fields.
Right now, it may be redundant because we resolved the issue mentioned
above in replace_varno_walker.OK, but my point is still that you should be paranoid in all the cases
or none of the cases. Right now (newId < 0) branch doesn't copy source
relids, but bms_is_member(oldId, relids) does copy. Also, I think
whether we copy or not should be reflected in the function comment./*
* Substitute newId by oldId in relids.
*/
static Bitmapset *
replace_relid(Relids relids, int oldId, int newId)
{
if (oldId < 0)
return relids;if (newId < 0)
/* Delete relid without substitution. */
return bms_del_member(relids, oldId);if (bms_is_member(oldId, relids))
return bms_add_member(bms_del_member(bms_copy(relids), oldId),
newId);return relids;
}
We tried to use replace_relid() for both cases of JOIN deletion:
unneeded outer join and self-join, and the relid deletion is used only
in the first case. Such an approach was used there for a long time, and
we just didn't change it.
I am prone to removing the copy operation in the code of relid replacement.
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
Thanks for the curious issue.
The new field Var::varnullingrels introduced in [2] doesn't make sense
here, as I see: we operate with plain relations only, and I don't know
how it can be applied to an arbitrary subtree contained OUTER JOINs.
The second option, the attnotnull flag, can be used in this code. We
haven't implemented it because the process_equivalence routine doesn't
check the attnotnull before creating NullTest.
In general, it is not a difficult operation - we just need to add a
trivial get_attnotnull() routine to lssycache.c likewise
get_attgenerated() and other functions.
But, replace_varno uses the walker to change the relid. The mentioned
replacement, like
X=X --> X IS NOT NULL
can be applied on different levels of the expression, look:
A a1 JOIN A a2 ON (a1.id=a2.id) WHERE (a1.x AND (a1.y=a2.y))
Here, we can replace id=id and y=y. It may need some 'unwanted clauses'
collection procedure and a second pass through the expression tree to
remove them. It may add some unpredictable overhead.
We can replace such a clause with a trivial 'TRUE' clause, of course.
But is it the feature you have requested?
--
regards,
Andrey Lepikhov
Postgres Professional
On 4/10/2023 14:34, Alexander Korotkov wrote:
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:
DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
check_attnotnull.difftext/plain; charset=UTF-8; name=check_attnotnull.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a127239d30..c12aa15fc9 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -73,7 +73,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
List *restrictlist,
List **extra_clauses);
static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
-static void replace_varno(Node *node, int from, int to);
+static void replace_varno(PlannerInfo *root, Node *node, int from, int to);
/*
@@ -388,7 +388,7 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
}
/* Update lateral references. */
- replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+ replace_varno(root, (Node *) otherrel->lateral_vars, relid, subst);
}
/*
@@ -425,7 +425,7 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo *rel,
sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
- replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
+ replace_varno(root, (Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -1399,6 +1399,7 @@ is_innerrel_unique_for(PlannerInfo *root,
typedef struct ReplaceVarnoContext
{
+ PlannerInfo *root;
int from;
int to;
} ReplaceVarnoContext;
@@ -1420,6 +1421,11 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
}
return false;
}
+
+ /*
+ * Expression walker doesn't know about RestrictInfo node. Do recursive pass
+ * into the clauses manually.
+ */
if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
@@ -1429,20 +1435,26 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
if (bms_is_member(ctx->from, rinfo->clause_relids))
{
- replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
- replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
- rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
- rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
- rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ replace_varno(ctx->root, (Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno(ctx->root, (Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, ctx->from, ctx->to);
}
if (is_req_equal)
rinfo->required_relids = rinfo->clause_relids;
else
- rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, ctx->from, ctx->to);
- rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
- rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
if (rinfo->mergeopfamilies &&
bms_get_singleton_member(rinfo->clause_relids, &relid) &&
@@ -1456,13 +1468,31 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
if (leftOp != NULL && equal(leftOp, rightOp))
{
- NullTest *ntest = makeNode(NullTest);
+ Oid reloid = 0;
+ AttrNumber attnum;
+
+ if (IsA(leftOp, Var))
+ {
+ Var *var = (Var *) leftOp;
+ reloid = ctx->root->simple_rte_array[var->varno]->relid;
+ attnum = var->varattno;
+ }
+
+ if (OidIsValid(reloid) && get_attnotnull(reloid, attnum))
+ {
+ rinfo->clause = (Expr *) makeBoolConst(true, false);
+ }
+ else
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ }
- ntest->arg = leftOp;
- ntest->nulltesttype = IS_NOT_NULL;
- ntest->argisrow = false;
- ntest->location = -1;
- rinfo->clause = (Expr *) ntest;
rinfo->mergeopfamilies = NIL;
}
Assert(rinfo->orclause == NULL);
@@ -1477,13 +1507,14 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
* Replace each occurence of removing relid with the keeping one
*/
static void
-replace_varno(Node *node, int from , int to)
+replace_varno(PlannerInfo *root, Node *node, int from , int to)
{
ReplaceVarnoContext ctx;
if (to <= 0)
return;
+ ctx.root = root;
ctx.from = from;
ctx.to = to;
replace_varno_walker(node, &ctx);
@@ -1531,7 +1562,7 @@ replace_relid(Relids relids, int oldId, int newId)
* delete them.
*/
static void
-update_eclass(EquivalenceClass *ec, int from, int to)
+update_eclass(PlannerInfo *root, EquivalenceClass *ec, int from, int to)
{
List *new_members = NIL;
List *new_sources = NIL;
@@ -1553,7 +1584,7 @@ update_eclass(EquivalenceClass *ec, int from, int to)
em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
/* We only process inner joins */
- replace_varno((Node *) em->em_expr, from, to);
+ replace_varno(root, (Node *) em->em_expr, from, to);
foreach(lc1, new_members)
{
@@ -1591,7 +1622,7 @@ update_eclass(EquivalenceClass *ec, int from, int to)
continue;
}
- replace_varno((Node *) rinfo, from, to);
+ replace_varno(root, (Node *) rinfo, from, to);
/*
* After switching the clause to the remaining relation, check it for
@@ -1685,7 +1716,7 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
- replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+ replace_varno(root, (Node *) rinfo, toRemove->relid, toKeep->relid);
if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
jinfo_candidates = lappend(jinfo_candidates, rinfo);
@@ -1704,7 +1735,7 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
{
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
- replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+ replace_varno(root, (Node *) rinfo, toRemove->relid, toKeep->relid);
if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
jinfo_candidates = lappend(jinfo_candidates, rinfo);
@@ -1792,7 +1823,7 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
{
EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
- update_eclass(ec, toRemove->relid, toKeep->relid);
+ update_eclass(root, ec, toRemove->relid, toKeep->relid);
toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
}
@@ -1804,7 +1835,7 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
{
Node *node = lfirst(lc);
- replace_varno(node, toRemove->relid, toKeep->relid);
+ replace_varno(root, node, toRemove->relid, toKeep->relid);
if (!list_member(toKeep->reltarget->exprs, node))
toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
}
@@ -1851,10 +1882,10 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* At last, replace links in the planner info */
remove_rel_from_query_common(root, toRemove, toKeep->relid, NULL, NULL);
/* replace varno in root targetlist and HAVING clause */
- replace_varno((Node *) root->processed_tlist,
- toRemove->relid, toKeep->relid);
- replace_varno((Node *) root->processed_groupClause,
- toRemove->relid, toKeep->relid);
+ replace_varno(root, (Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno(root, (Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
/* ... and remove the rel from the baserel array */
root->simple_rel_array[toRemove->relid] = NULL;
pfree(toRemove);
@@ -1914,7 +1945,8 @@ split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
* Quite expensive operation, narrowing the use case. For example, when
* we have cast of the same var to different (but compatible) types.
*/
- replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ replace_varno(root, rightexpr,
+ bms_singleton_member(rinfo->right_relids),
bms_singleton_member(rinfo->left_relids));
if (equal(leftexpr, rightexpr))
@@ -1954,7 +1986,7 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
bms_is_empty(rinfo->right_relids));
clause = (Expr *) copyObject(rinfo->clause);
- replace_varno((Node *) clause, relid, outer->relid);
+ replace_varno(root, (Node *) clause, relid, outer->relid);
iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
get_leftop(clause);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index fc6d267e44..7778ed483f 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -929,6 +929,25 @@ get_attgenerated(Oid relid, AttrNumber attnum)
return result;
}
+char
+get_attnotnull(Oid relid, AttrNumber attnum)
+{
+ HeapTuple tp;
+ Form_pg_attribute att_tup;
+ char result;
+
+ tp = SearchSysCache2(ATTNUM,
+ ObjectIdGetDatum(relid),
+ Int16GetDatum(attnum));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for attribute %d of relation %u",
+ attnum, relid);
+ att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+ result = att_tup->attnotnull;
+ ReleaseSysCache(tp);
+ return result;
+}
+
/*
* get_atttype
*
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f5fdbfe116..02521299a1 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -92,6 +92,7 @@ extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok);
extern AttrNumber get_attnum(Oid relid, const char *attname);
extern int get_attstattarget(Oid relid, AttrNumber attnum);
extern char get_attgenerated(Oid relid, AttrNumber attnum);
+extern char get_attnotnull(Oid relid, AttrNumber attnum);
extern Oid get_atttype(Oid relid, AttrNumber attnum);
extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
Oid *typid, int32 *typmod, Oid *collid);
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index de71441052..3d5de28354 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -438,15 +438,14 @@ set enable_mergejoin to off;
explain (costs off)
select * from ec0 m join ec0 n on m.ff = n.ff
join ec1 p on m.ff + n.ff = p.f1;
- QUERY PLAN
-----------------------------------------
+ QUERY PLAN
+---------------------------------------
Nested Loop
Join Filter: ((n.ff + n.ff) = p.f1)
- -> Seq Scan on ec1 p
+ -> Seq Scan on ec0 n
-> Materialize
- -> Seq Scan on ec0 n
- Filter: (ff IS NOT NULL)
-(6 rows)
+ -> Seq Scan on ec1 p
+(5 rows)
explain (costs off)
select * from ec0 m join ec0 n on m.ff = n.ff
@@ -455,11 +454,10 @@ explain (costs off)
---------------------------------------------------------------
Nested Loop
Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
- -> Seq Scan on ec1 p
+ -> Seq Scan on ec0 n
-> Materialize
- -> Seq Scan on ec0 n
- Filter: (ff IS NOT NULL)
-(6 rows)
+ -> Seq Scan on ec1 p
+(5 rows)
reset enable_mergejoin;
-- this could be converted, but isn't at present
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 027c356bcc..10b23944fe 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6337,14 +6337,14 @@ SELECT * FROM pg_am am WHERE am.amname IN (
JOIN pg_class c2
ON c1.oid=c2.oid AND c1.oid < 10
);
- QUERY PLAN
----------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------
Nested Loop Semi Join
Join Filter: (am.amname = c2.relname)
-> Seq Scan on pg_am am
-> Materialize
-> Index Scan using pg_class_oid_index on pg_class c2
- Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+ Index Cond: (oid < '10'::oid)
(6 rows)
--
@@ -6599,14 +6599,14 @@ SELECT COUNT(*) FROM tab_with_flag
WHERE
(is_flag IS NULL OR is_flag = 0)
AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
- QUERY PLAN
-----------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tab_with_flag
- Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
Filter: ((is_flag IS NULL) OR (is_flag = 0))
-> Bitmap Index Scan on tab_with_flag_pkey
- Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
(6 rows)
DROP TABLE tab_with_flag;
@@ -6764,11 +6764,11 @@ reset self_join_search_limit;
CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
explain (verbose, costs off)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
- QUERY PLAN
-----------------------------------------------------------
+ QUERY PLAN
+------------------------------------------
Seq Scan on public.emp1 e2
Output: e2.id, e2.code, e2.id, e2.code
- Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+ Filter: (e2.code <> e2.code)
(3 rows)
-- We can remove the join even if we find the join can't duplicate rows and
Hi!
I have reviewed your patch and I noticed a few things.
First of all, I think I found a bug in your latest patch version, and
this query shows it:
EXPLAIN (COSTS OFF)
SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
SELECT e1.oid FROM pg_extension e1, pg_extension e2
WHERE e1.oid=e2.oid) AS e
ON c.oid=e.oid;
In the current version we get such a query plan:
QUERY PLAN
-----------------------------------------
Hash Full Join
Hash Cond: (c.oid = e2.oid)
-> Seq Scan on pg_class c
-> Hash
-> Seq Scan on pg_extension e2
(5 rows)
But I think it should be:
QUERY PLAN
-----------------------------------------
Hash Full Join
Hash Cond: (c.oid = e2.oid)
-> Seq Scan on pg_class c
-> Hash
-> Seq Scan on pg_extension e2
*Filter: (oid IS NOT NULL)*
(6 rows)
I have looked at the latest version of the code, I assume that the error
lies in the replace_varno_walker function, especially in the place where
we check the node by type Var, and does not form any NullTest node.
if (OidIsValid(reloid) && get_attnotnull(reloid, attnum)) -- this
condition works
{
rinfo->clause = (Expr *) makeBoolConst(true, false);
}
else
{
NullTest *ntest = makeNode(NullTest);
ntest->arg = leftOp;
ntest->nulltesttype = IS_NOT_NULL;
ntest->argisrow = false;
ntest->location = -1;
rinfo->clause = (Expr *) ntest;
}
Secondly, I added some code in some places to catch erroneous cases and
added a condition when we should not try to apply the self-join-removal
transformation due to the absence of an empty self-join list after
searching for it and in general if there are no joins in the query.
Besides, I added a query for testing and wrote about it above. I have
attached my diff file.
In addition, I found a comment for myself that was not clear to me. I
would be glad if you could explain it to me.
You mentioned superior outer join in the comment, unfortunately, I
didn't find anything about it in the PostgreSQL code, and this
explanation remained unclear to me. Could you explain in more detail
what you meant?
/*
* At this stage joininfo lists of inner and outer can contain
* only clauses, required for *a superior outer join* that can't
* influence on this optimization. So, we can avoid to call the
* build_joinrel_restrictlist() routine.
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);
--
Regards,
Alena Rybakina
Attachments:
self-join-removal.difftext/x-patch; charset=UTF-8; name=self-join-removal.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 3e10083905c..5ba5ca693f1 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1704,6 +1704,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
List *binfo_candidates = NIL;
ReplaceVarnoContext ctx = {.from = toRemove->relid, .to = toKeep->relid};
+ Assert(toKeep->relid != -1);
+
/*
* Replace index of removing table with the keeping one. The technique of
* removing/distributing restrictinfo is used here to attach just appeared
@@ -2007,6 +2009,8 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
/* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
continue;
+ Assert(is_opclause(orinfo->clause));
+
oclause = bms_is_empty(orinfo->left_relids) ?
get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
c2 = (bms_is_empty(orinfo->left_relids) ?
@@ -2150,6 +2154,18 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
split_selfjoin_quals(root, restrictlist, &selfjoinquals,
&otherjoinquals, inner->relid, outer->relid);
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x' if we had
+ * an foreign key constraint on some of other quals and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
+
/*
* To enable SJE for the only degenerate case without any self join
* clauses at all, add baserestrictinfo to this list.
@@ -2332,7 +2348,7 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
Relids ToRemove = NULL;
int relid = -1;
- if (!enable_self_join_removal)
+ if ((list_length(joinlist) <=1 && !IsA(linitial(joinlist), List)) || !enable_self_join_removal)
return joinlist;
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 10b23944feb..800410d6b18 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,13 +5807,11 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
-------------------------------------------------
+ QUERY PLAN
+--------------------------
Result
One-Time Filter: false
- -> Index Scan using parent_pkey on parent x
- Index Cond: (k = 1)
-(4 rows)
+(2 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -6759,6 +6757,20 @@ explain (costs off) -- Collapse both self joins
Filter: (a IS NOT NULL)
(2 rows)
+EXPLAIN (COSTS OFF)
+SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
+ SELECT e1.oid FROM pg_extension e1, pg_extension e2
+ WHERE e1.oid=e2.oid) AS e
+ ON c.oid=e.oid;
+ QUERY PLAN
+-----------------------------------------
+ Hash Full Join
+ Hash Cond: (c.oid = e2.oid)
+ -> Seq Scan on pg_class c
+ -> Hash
+ -> Seq Scan on pg_extension e2
+(5 rows)
+
reset self_join_search_limit;
-- Check that clauses from the join filter list is not lost on the self-join removal
CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index a73c1f90c4a..1950e6f281f 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,13 +2499,16 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
---------------------------------------------------
- Update on base_tbl
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(4 rows)
+ QUERY PLAN
+-------------------------------------------------------------------
+ Update on base_tbl base_tbl_1
+ -> Nested Loop
+ -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
+ Index Cond: (id = 1)
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(7 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1a643638372..4d49c0767a0 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2564,6 +2564,11 @@ explain (costs off) -- Collapse one self join
set self_join_search_limit to 3;
explain (costs off) -- Collapse both self joins
select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+EXPLAIN (COSTS OFF)
+SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
+ SELECT e1.oid FROM pg_extension e1, pg_extension e2
+ WHERE e1.oid=e2.oid) AS e
+ ON c.oid=e.oid;
reset self_join_search_limit;
-- Check that clauses from the join filter list is not lost on the self-join removal
On 11/10/2023 02:29, Alena Rybakina wrote:
I have reviewed your patch and I noticed a few things.
Thanks for your efforts,
I have looked at the latest version of the code, I assume that the error
lies in the replace_varno_walker function, especially in the place where
we check the node by type Var, and does not form any NullTest node.
It's not a bug, it's an optimization we discussed with Alexander above.
Secondly, I added some code in some places to catch erroneous cases and
added a condition when we should not try to apply the self-join-removal
transformation due to the absence of an empty self-join list after
searching for it and in general if there are no joins in the query.
Besides, I added a query for testing and wrote about it above. I have
attached my diff file.
Ok, I will look at this
In addition, I found a comment for myself that was not clear to me. I
would be glad if you could explain it to me.You mentioned superior outer join in the comment, unfortunately, I
didn't find anything about it in the PostgreSQL code, and this
explanation remained unclear to me. Could you explain in more detail
what you meant?
I meant here that only clauses pushed by reconsider_outer_join_clauses()
can be found in the joininfo list, and they are not relevant, as you can
understand.
Having written that, I realized that it was a false statement. ;) -
joininfo can also contain results of previous SJE iterations, look:
CREATE TABLE test (oid int PRIMARY KEY);
CREATE UNIQUE INDEX ON test((oid*oid));
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;
Having executed this SQL code, you could see that in the last query, the
SJE feature didn't delete one of the JOINs because of the reason I had
written above.
It's not an one-minute fix - I will try to propose solution a bit later.
--
regards,
Andrey Lepikhov
Postgres Professional
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 4/10/2023 14:34, Alexander Korotkov wrote:
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.
OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).
I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.
Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.
Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).
Links
1. /messages/by-id/CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ@mail.gmail.com
2. /messages/by-id/96f66ae3-df10-4060-9844-4c9633062cd3@yandex.ru
------
Regards,
Alexander Korotkov
Attachments:
0001-Remove-useless-self-joins-v44.patchapplication/octet-stream; name=0001-Remove-useless-self-joins-v44.patchDownload
From 75101f1553015e979bdaae3978004cf8b4233e4b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1186 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 839 +++++++++++-
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 365 +++++
src/tools/pgindent/typedefs.list | 2 +
14 files changed, 2484 insertions(+), 68 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 924309af26d..c1870b3e298 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5305,6 +5305,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5c..379563e87a7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a0..f0746f35a3c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,8 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +35,46 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * 'extra_clauses' contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for the SJ checking procedure: SJ
+ * can be removed if the outer relation contains strictly the same set of
+ * clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+/*
+ * The context for replace_varno_walker() containing source and target relids2
+ */
+typedef struct
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+/*
+ * The struct containing self-join candidate. Used to find duplicate reloids.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +82,20 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
+static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx);
+static int self_join_candidates_cmp(const void *a, const void *b);
+
/*
@@ -89,7 +133,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +352,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -320,29 +364,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +405,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +434,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +482,47 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +917,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +937,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1255,30 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also return additional clauses
+ * from a baserestrictinfo list that were used to prove uniqueness.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1300,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1324,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1337,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1391,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,5 +1434,973 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Replace each occurence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Walker function for replace_varno()
+ */
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(relids, oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
+
+ Assert(toKeep->relid != -1);
+
+ /*
+ * Replace index of removing table with the keeping one. The technique of
+ * removing/distributing restrictinfo is used here to attach just appeared
+ * (for keeping relation) join clauses and avoid of adding duplicates of
+ * those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side we reduce length of restrict
+ * lists that can impact planning or executing time. Additionally, we
+ * improve accuracy of cardinality estimation. On the other side, it is
+ * one more place which can make planning time much longer in specific
+ * cases. It would have been better to avoid calling the equal() function
+ * here, but it's only way to detect duplicated inequality expressions.
+ */
+
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing table, with the keeping
+ * one: varno of removin table should be replaced in members and sources
+ * lists. Also, remove duplicated elements if this replacement procedure
+ * created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(
+ toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* At last, replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+ /* replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+ /* ... and remove the rel from the baserel array */
+ root->simple_rel_array[toRemove->relid] = NULL;
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list.
+ * Degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ pg_qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create group of relation indexes with the same oid */
+ Relids group = NULL;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+ toRemove = bms_add_members(toRemove,
+ remove_self_joins_one_group(root, group));
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join cadidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clause sare assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead,
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fcc0eacd253..be5ef79af39 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 16ec6c5ef02..fbd371daf30 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1027,6 +1027,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2078,6 +2088,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 50bc3b503a6..37f577e9b00 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -75,6 +75,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 31c188176b7..5f45ffda57e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +106,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..de714410524 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b8638f286a..12a90bd42e8 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,11 +5807,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -6134,6 +6136,837 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
+ SELECT e1.oid FROM pg_extension e1, pg_extension e2
+ WHERE e1.oid=e2.oid) AS e
+ ON c.oid=e.oid;
+ QUERY PLAN
+-----------------------------------------
+ Hash Full Join
+ Hash Cond: (c.oid = e2.oid)
+ -> Seq Scan on pg_class c
+ -> Hash
+ -> Seq Scan on pg_extension e2
+ Filter: (oid IS NOT NULL)
+(6 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index aae5d51e1c9..271313ebf86 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281f..a73c1f90c4a 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31055..77dd964ebf2 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..4d49c0767a0 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,371 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+EXPLAIN (COSTS OFF)
+SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
+ SELECT e1.oid FROM pg_extension e1, pg_extension e2
+ WHERE e1.oid=e2.oid) AS e
+ ON c.oid=e.oid;
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8de90c49585..2ff4881fdfd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,6 +367,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -3833,6 +3834,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.3 (Apple Git-145)
On 12/10/2023 18:32, Alexander Korotkov wrote:
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).
Yeah. I think It would be meaningful in the case of replacing also
nested x IS NOT NULL with nothing. But it requires using a mutator
instead of the walker and may be done more accurately next time.
I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.
Great!
Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.
I see. The thread is too long so far, thanks for the catch.
Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).
Agree. I wouldn't say I like it too. But also, I suggest skipping some
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
checked as is_opclause.
All these changes (see in the attachment) are optional.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
minor-changes-v44.difftext/plain; charset=UTF-8; name=minor-changes-v44.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index f0746f35a3..7b8dc7a2b7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1710,8 +1710,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
List *binfo_candidates = NIL;
ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
- Assert(toKeep->relid != -1);
-
/*
* Replace index of removing table with the keeping one. The technique of
* removing/distributing restrictinfo is used here to attach just appeared
@@ -2017,8 +2015,6 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
/* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
continue;
- Assert(is_opclause(orinfo->clause));
-
oclause = bms_is_empty(orinfo->left_relids) ?
get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
c2 = (bms_is_empty(orinfo->left_relids) ?
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2ff4881fdf..96ebd6eed3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,7 +367,6 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
-ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2341,6 +2340,7 @@ ReorderBufferUpdateProgressTxnCB
ReorderTuple
RepOriginId
ReparameterizeForeignPathByChild_function
+ReplaceVarnoContext
ReplaceVarsFromTargetList_context
ReplaceVarsNoMatchOption
ReplicaIdentityStmt
@@ -2474,6 +2474,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).Agree. I wouldn't say I like it too. But also, I suggest skipping some
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
checked as is_opclause.
All these changes (see in the attachment) are optional.
I don't mind about asserts, maybe I misunderstood something in the patch.
About skipping SJ removal when no SJ quals is found, I assume it is
about it:
split_selfjoin_quals(root, restrictlist, &selfjoinquals,
&otherjoinquals, inner->relid,
outer->relid);
+ if (list_length(selfjoinquals) == 0)
+ {
+ /*
+ * XXX:
+ * we would detect self-join without quals like 'x==x'
if we had
+ * an foreign key constraint on some of other quals
and this join
+ * haven't any columns from the outer in the target list.
+ * But it is still complex task.
+ */
+ continue;
+ }
as far as I remember, this is the place where it is checked that the SJ
list is empty and it is logical, in my opinion, that no transformations
should be performed if no elements are found for them.
As for the cases where SJ did not work, maybe this is just right if
there are no elements for processing these cases. I'll try to check or
come up with tests for them. If I'm wrong, write.
On 11.10.2023 06:51, Andrei Lepikhov wrote:
On 11/10/2023 02:29, Alena Rybakina wrote:
I have reviewed your patch and I noticed a few things.
Thanks for your efforts,
I have looked at the latest version of the code, I assume that the
error lies in the replace_varno_walker function, especially in the
place where we check the node by type Var, and does not form any
NullTest node.It's not a bug, it's an optimization we discussed with Alexander above.
Secondly, I added some code in some places to catch erroneous cases
and added a condition when we should not try to apply the
self-join-removal transformation due to the absence of an empty
self-join list after searching for it and in general if there are no
joins in the query. Besides, I added a query for testing and wrote
about it above. I have attached my diff file.Ok, I will look at this
In addition, I found a comment for myself that was not clear to me. I
would be glad if you could explain it to me.You mentioned superior outer join in the comment, unfortunately, I
didn't find anything about it in the PostgreSQL code, and this
explanation remained unclear to me. Could you explain in more detail
what you meant?I meant here that only clauses pushed by
reconsider_outer_join_clauses() can be found in the joininfo list, and
they are not relevant, as you can understand.
Having written that, I realized that it was a false statement. ;) -
joininfo can also contain results of previous SJE iterations, look:CREATE TABLE test (oid int PRIMARY KEY);
CREATE UNIQUE INDEX ON test((oid*oid));
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;
Ok, I understood. Thank you for explanation.
--
Regards,
Alena Rybakina
On 13/10/2023 15:56, a.rybakina wrote:
Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).Agree. I wouldn't say I like it too. But also, I suggest skipping some
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
checked as is_opclause.
All these changes (see in the attachment) are optional.I don't mind about asserts, maybe I misunderstood something in the patch.
About skipping SJ removal when no SJ quals is found, I assume it is
about it:split_selfjoin_quals(root, restrictlist, &selfjoinquals,
&otherjoinquals, inner->relid,
outer->relid);+ if (list_length(selfjoinquals) == 0) + { + /* + * XXX: + * we would detect self-join without quals like 'x==x' if we had + * an foreign key constraint on some of other quals and this join + * haven't any columns from the outer in the target list. + * But it is still complex task. + */ + continue; + }as far as I remember, this is the place where it is checked that the SJ
list is empty and it is logical, in my opinion, that no transformations
should be performed if no elements are found for them.
You forget we have "Degenerate" case, as Alexander mentioned above. What
if you have something like that:
SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1;
In this case, uniqueness can be achieved by the baserestrictinfo
"A.id=1", if we have an unique index on this column.
--
regards,
Andrey Lepikhov
Postgres Professional
On 13.10.2023 12:03, Andrei Lepikhov wrote:
On 13/10/2023 15:56, a.rybakina wrote:
Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).Agree. I wouldn't say I like it too. But also, I suggest skipping
some unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all
the negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already
checked as is_opclause.
All these changes (see in the attachment) are optional.I don't mind about asserts, maybe I misunderstood something in the
patch.About skipping SJ removal when no SJ quals is found, I assume it is
about it:split_selfjoin_quals(root, restrictlist, &selfjoinquals,
&otherjoinquals, inner->relid,
outer->relid);+ if (list_length(selfjoinquals) == 0) + { + /* + * XXX: + * we would detect self-join without quals like 'x==x' if we had + * an foreign key constraint on some of other quals and this join + * haven't any columns from the outer in the target list. + * But it is still complex task. + */ + continue; + }as far as I remember, this is the place where it is checked that the
SJ list is empty and it is logical, in my opinion, that no
transformations should be performed if no elements are found for them.You forget we have "Degenerate" case, as Alexander mentioned above.
What if you have something like that:
SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1;
In this case, uniqueness can be achieved by the baserestrictinfo
"A.id=1", if we have an unique index on this column.
Yes, sorry, I missed it. thanks again for the explanation 🙂
On 12/10/2023 18:32, Alexander Korotkov wrote:
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 4/10/2023 14:34, Alexander Korotkov wrote:
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).
I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
iterative_elimination.difftext/plain; charset=UTF-8; name=iterative_elimination.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 7b8dc7a2b7..f7ccda5231 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2298,6 +2298,7 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
{
/* Create group of relation indexes with the same oid */
Relids group = NULL;
+ Relids removed;
while (i < j)
{
@@ -2306,8 +2307,21 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
}
relids = bms_del_members(relids, group);
- toRemove = bms_add_members(toRemove,
- remove_self_joins_one_group(root, group));
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make next attempt iteratively - if something is deleted from
+ * a group, changes in clauses and equivalence classes can give
+ * us a chance to find more candidates.
+ */
+ do {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
bms_free(group);
}
else
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 12a90bd42e..cb2429645c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6786,6 +6786,36 @@ SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
(3 rows)
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up
-- moving quals over to s1 to make it so it can't match any rows.
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4d49c0767a..55147263ca 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2576,6 +2576,16 @@ CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
explain (verbose, costs off)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up
-- moving quals over to s1 to make it so it can't match any rows.
On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 12/10/2023 18:32, Alexander Korotkov wrote:
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 4/10/2023 14:34, Alexander Korotkov wrote:
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).
Thank you, I've integrated this into the patch. BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit. enable_self_join_removal variable turns
on/off optimization at all. self_join_search_limit variable limits
its usage by the number of joins. AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has. I tried to reproduce the extreme case.
SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;
This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled. So, no measurable
overhead. Similar to the higher number of joins. Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts? If not, should we
remove self_join_search_limit GUC?
------
Regards,
Alexander Korotkov
Attachments:
0001-Remove-useless-self-joins-v45.patchapplication/octet-stream; name=0001-Remove-useless-self-joins-v45.patchDownload
From ee8105a5b80016c5d1bf465b64d7ac894fd56c68 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1201 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 22 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 7 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 869 +++++++++++-
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 375 +++++
src/tools/pgindent/typedefs.list | 3 +
14 files changed, 2540 insertions(+), 68 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3839c72c868..906d0a70076 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5306,6 +5306,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ query tree and replaces self joins with semantically equivalent single
+ scans. Take into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5c..379563e87a7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a0..0dfae11b27f 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,8 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +35,46 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * 'extra_clauses' contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for the SJ checking procedure: SJ
+ * can be removed if the outer relation contains strictly the same set of
+ * clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+/*
+ * The context for replace_varno_walker() containing source and target relids2
+ */
+typedef struct
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+/*
+ * The struct containing self-join candidate. Used to find duplicate reloids.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+int self_join_search_limit;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +82,20 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
+static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx);
+static int self_join_candidates_cmp(const void *a, const void *b);
+
/*
@@ -89,7 +133,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +352,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -320,29 +364,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +405,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +434,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +482,47 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +917,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +937,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1255,30 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also return additional clauses
+ * from a baserestrictinfo list that were used to prove uniqueness.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1300,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1324,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1337,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1391,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,5 +1434,988 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Replace each occurence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Walker function for replace_varno()
+ */
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(relids, oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs, because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
+
+ Assert(toKeep->relid != -1);
+
+ /*
+ * Replace index of removing table with the keeping one. The technique of
+ * removing/distributing restrictinfo is used here to attach just appeared
+ * (for keeping relation) join clauses and avoid of adding duplicates of
+ * those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side we reduce length of restrict
+ * lists that can impact planning or executing time. Additionally, we
+ * improve accuracy of cardinality estimation. On the other side, it is
+ * one more place which can make planning time much longer in specific
+ * cases. It would have been better to avoid calling the equal() function
+ * here, but it's only way to detect duplicated inequality expressions.
+ */
+
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing table, with the keeping
+ * one: varno of removin table should be replaced in members and sources
+ * lists. Also, remove duplicated elements if this replacement procedure
+ * created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(
+ toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* At last, replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+ /* replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+ /* ... and remove the rel from the baserel array */
+ root->simple_rel_array[toRemove->relid] = NULL;
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with corresponding sides of the
+ * outer's filters. If no one detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove table participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence on this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list.
+ * Degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are selfjoin quals. Otherwise we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+
+ /*
+ * Limit the number of joins we process to control the quadratic
+ * behavior.
+ */
+ if (bms_num_members(relids) >= self_join_search_limit)
+ break;
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ pg_qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join cadidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clause sare assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead,
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of query. If it has any references
+ * to the removed relations, we update them to point to the remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fcc0eacd253..be5ef79af39 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4c585741661..2018b2d22d9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1027,6 +1027,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
@@ -2078,6 +2088,18 @@ struct config_int ConfigureNamesInt[] =
8, 1, INT_MAX,
NULL, NULL, NULL
},
+ {
+ {"self_join_search_limit", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Max size of a group of plain tables where planner "
+ "will lookup for self joins."),
+ gettext_noop("If a number of tables in FROM-list is exceeded this "
+ "limit, the planner will ignore tables beyond it."),
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &self_join_search_limit,
+ 32, 1, INT_MAX,
+ NULL, NULL, NULL
+ },
{
{"geqo_threshold", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Sets the threshold of FROM items beyond which GEQO is used."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7b896d821e8..2d8dbba7cd3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,6 +71,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 31c188176b7..5f45ffda57e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,8 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
+extern PGDLLIMPORT int self_join_search_limit;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +106,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..de714410524 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b95d30f6586..1f157a615a4 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5805,11 +5805,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)
-- bug 5255: this is not optimizable by join removal
begin;
@@ -6132,6 +6134,867 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+---------------------------------
+ Result
+ One-Time Filter: false
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(4 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+-------------------------------------------
+ Nested Loop
+ Join Filter: (j1.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (j1.a = j2.a)
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(9 rows)
+
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j2.a = j3.a)
+ -> Seq Scan on sj j3
+ -> Materialize
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj j3
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
+ SELECT e1.oid FROM pg_extension e1, pg_extension e2
+ WHERE e1.oid=e2.oid) AS e
+ ON c.oid=e.oid;
+ QUERY PLAN
+-----------------------------------------
+ Hash Full Join
+ Hash Cond: (c.oid = e2.oid)
+ -> Seq Scan on pg_class c
+ -> Hash
+ -> Seq Scan on pg_extension e2
+ Filter: (oid IS NOT NULL)
+(6 rows)
+
+reset self_join_search_limit;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index aae5d51e1c9..271313ebf86 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281f..a73c1f90c4a 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31055..77dd964ebf2 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..55147263ca8 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,381 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+set self_join_search_limit to 1;
+explain (costs off) -- No one self join should be collapsed
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 2;
+explain (costs off) -- Collapse one self join
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+set self_join_search_limit to 3;
+explain (costs off) -- Collapse both self joins
+ select j1.* from sj j1, sj j2, sj j3 where (j1.a = j2.a) and (j2.a = j3.a);
+EXPLAIN (COSTS OFF)
+SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
+ SELECT e1.oid FROM pg_extension e1, pg_extension e2
+ WHERE e1.oid=e2.oid) AS e
+ ON c.oid=e.oid;
+reset self_join_search_limit;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e69bb671bf8..82508f5ff5f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,6 +367,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2473,6 +2474,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3834,6 +3836,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.3 (Apple Git-145)
On 19/10/2023 01:50, Alexander Korotkov wrote:
On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 12/10/2023 18:32, Alexander Korotkov wrote:
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 4/10/2023 14:34, Alexander Korotkov wrote:
Relid replacement machinery is the most contradictory code here. We used
a utilitarian approach and implemented a simplistic variant.2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary. [1] points that infrastructure from [2] might
be useful. The patchset from [2] seems committed mow. However, I
can't see it is directly helpful in this matter. Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.OK. I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).I made improvements over the code. Mostly new comments, grammar
corrections of existing comments and small refactoring.Also, I found that the suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it. That helps to evade quadratic complexity with
large number of relations.Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found. It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).Thank you, I've integrated this into the patch. BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit. enable_self_join_removal variable turns
on/off optimization at all. self_join_search_limit variable limits
its usage by the number of joins. AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has. I tried to reproduce the extreme case.SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled. So, no measurable
overhead. Similar to the higher number of joins. Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts? If not, should we
remove self_join_search_limit GUC?
Thanks,
It was Zhihong Yu who worried about that case [1]/messages/by-id/CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w@mail.gmail.com. And my purpose was to
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the
problems here. So, I vote to remove this GUC.
[1]: /messages/by-id/CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w@mail.gmail.com
/messages/by-id/CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w@mail.gmail.com
--
regards,
Andrey Lepikhov
Postgres Professional
On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 19/10/2023 01:50, Alexander Korotkov wrote:
This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled. So, no measurable
overhead. Similar to the higher number of joins. Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts? If not, should we
remove self_join_search_limit GUC?Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the
problems here. So, I vote to remove this GUC.
I've removed the self_join_search_limit. Anyway there is
enable_self_join_removal if the self join removal algorithm causes any
problems. I also did some grammar corrections for the comments. I
think the patch is getting to the committable shape. I noticed some
failures on commitfest.cputube.org. I'd like to check how this
version will pass it.
------
Regards,
Alexander Korotkov
Attachments:
0001-Remove-useless-self-joins-v46.patchapplication/octet-stream; name=0001-Remove-useless-self-joins-v46.patchDownload
From dd3bb8fed25826430009c24f1bad5e98613017a1 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 22 Oct 2023 00:45:09 +0300
Subject: [PATCH] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1212 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 10 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 805 +++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 359 +++++
src/tools/pgindent/typedefs.list | 3 +
14 files changed, 2458 insertions(+), 68 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3839c72c868..bd138236acc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5306,6 +5306,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5c..379563e87a7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a0..6b848aadad3 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,8 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
+#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +35,45 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * 'extra_clauses' contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for the SJ checking procedure: SJ
+ * can be removed if the outer relation contains strictly the same set of
+ * clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+/*
+ * The context for replace_varno_walker() containing source and target relids2
+ */
+typedef struct
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+/*
+ * The struct containing self-join candidate. Used to find duplicate reloids.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +81,20 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
+static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx);
+static int self_join_candidates_cmp(const void *a, const void *b);
+
/*
@@ -89,7 +132,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +351,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -320,29 +363,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +404,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +433,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +481,47 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +916,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +936,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1254,30 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also return additional clauses
+ * from a baserestrictinfo list that were used to prove uniqueness.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1299,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1323,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1336,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1390,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1421,1006 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Replace each occurrence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Walker function for replace_varno()
+ */
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(relids, oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
+
+ Assert(toKeep->relid != -1);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove tables participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are selfjoin quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ pg_qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join cadidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fcc0eacd253..be5ef79af39 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4c585741661..7605eff9b9d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1027,6 +1027,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7b896d821e8..2d8dbba7cd3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,6 +71,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 31c188176b7..2bc857745a9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +105,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..de714410524 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b95d30f6586..f59a921cdc6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6132,6 +6132,811 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index aae5d51e1c9..271313ebf86 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281f..a73c1f90c4a 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31055..77dd964ebf2 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..f8b02601775 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,365 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc9..085d0d7e548 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,6 +367,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2473,6 +2474,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3835,6 +3837,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.3 (Apple Git-145)
On 22/10/2023 05:01, Alexander Korotkov wrote:
On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 19/10/2023 01:50, Alexander Korotkov wrote:
This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled. So, no measurable
overhead. Similar to the higher number of joins. Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts? If not, should we
remove self_join_search_limit GUC?Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the
problems here. So, I vote to remove this GUC.I've removed the self_join_search_limit. Anyway there is
enable_self_join_removal if the self join removal algorithm causes any
problems. I also did some grammar corrections for the comments. I
think the patch is getting to the committable shape. I noticed some
failures on commitfest.cputube.org. I'd like to check how this
version will pass it.
I have observed the final patch. A couple of minor changes can be made
(see attachment).
Also, I see room for improvement, but it can be done later. For example,
we limit the optimization to only ordinary tables in this patch. It can
be extended at least with partitioned and foreign tables soon.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
minor_changes.difftext/plain; charset=UTF-8; name=minor_changes.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6b848aadad..b84197dadb 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -23,7 +23,6 @@
#include "postgres.h"
#include "catalog/pg_class.h"
-#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -50,7 +49,7 @@ typedef struct UniqueRelInfo
} UniqueRelInfo;
/*
- * The context for replace_varno_walker() containing source and target relids2
+ * The context for replace_varno_walker() containing source and target relids.
*/
typedef struct
{
On Mon, Oct 23, 2023 at 6:43 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 22/10/2023 05:01, Alexander Korotkov wrote:
On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:On 19/10/2023 01:50, Alexander Korotkov wrote:
This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled. So, no measurable
overhead. Similar to the higher number of joins. Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts? If not, should we
remove self_join_search_limit GUC?Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the
problems here. So, I vote to remove this GUC.I've removed the self_join_search_limit. Anyway there is
enable_self_join_removal if the self join removal algorithm causes any
problems. I also did some grammar corrections for the comments. I
think the patch is getting to the committable shape. I noticed some
failures on commitfest.cputube.org. I'd like to check how this
version will pass it.I have observed the final patch. A couple of minor changes can be made
(see attachment).
Thank you, Andrei! I've integrated your changes into the patch.
Also, I see room for improvement, but it can be done later. For example,
we limit the optimization to only ordinary tables in this patch. It can
be extended at least with partitioned and foreign tables soon.
Yes, I think it's reasonable to postpone some improvements. It's
important to get the basic feature in, make sure it's safe and stable.
Then we can make improvements incrementally.
I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.
------
Regards,
Alexander Korotkov
Attachments:
0001-Remove-useless-self-joins-v47.patchapplication/octet-stream; name=0001-Remove-useless-self-joins-v47.patchDownload
From 21de6277a87b5a69cc446ef282358d46ddb1f921 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 22 Oct 2023 00:45:09 +0300
Subject: [PATCH] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1211 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 10 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 805 +++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 359 +++++
src/tools/pgindent/typedefs.list | 3 +
14 files changed, 2457 insertions(+), 68 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3839c72c868..bd138236acc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5306,6 +5306,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5c..379563e87a7 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniquiness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a0..b84197dadbc 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +34,45 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * 'extra_clauses' contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for the SJ checking procedure: SJ
+ * can be removed if the outer relation contains strictly the same set of
+ * clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+/*
+ * The context for replace_varno_walker() containing source and target relids.
+ */
+typedef struct
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+/*
+ * The struct containing self-join candidate. Used to find duplicate reloids.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +80,20 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
+static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx);
+static int self_join_candidates_cmp(const void *a, const void *b);
+
/*
@@ -89,7 +131,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +350,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -320,29 +362,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +403,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +432,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +480,47 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +915,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +935,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1253,30 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also return additional clauses
+ * from a baserestrictinfo list that were used to prove uniqueness.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1298,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1322,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1335,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1389,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1420,1006 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Replace each occurrence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Walker function for replace_varno()
+ */
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(relids, oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
+
+ Assert(toKeep->relid != -1);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove tables participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to seperate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are selfjoin quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ pg_qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join cadidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fcc0eacd253..be5ef79af39 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4c585741661..7605eff9b9d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1027,6 +1027,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7b896d821e8..2d8dbba7cd3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,6 +71,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 31c188176b7..2bc857745a9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +105,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..de714410524 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b95d30f6586..f59a921cdc6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6132,6 +6132,811 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index aae5d51e1c9..271313ebf86 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281f..a73c1f90c4a 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31055..77dd964ebf2 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..f8b02601775 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,365 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc9..085d0d7e548 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,6 +367,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2473,6 +2474,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3835,6 +3837,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.3 (Apple Git-145)
Hi Alexander,
23.10.2023 12:47, Alexander Korotkov wrote:
I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.
On a quick glance, I've noticed following typos/inconsistencies in the
patch, which maybe worth fixing:
s/cadidates/candidates/
s/uniquiness/uniqueness/
s/selfjoin/self-join/
s/seperate/separate/
Also, shouldn't the reference "see generate_implied_equalities" be
"see generate_implied_equalities_for_column"?
Best regards,
Alexander
On Mon, Oct 23, 2023 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
23.10.2023 12:47, Alexander Korotkov wrote:
I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.On a quick glance, I've noticed following typos/inconsistencies in the
patch, which maybe worth fixing:
s/cadidates/candidates/
s/uniquiness/uniqueness/
s/selfjoin/self-join/
s/seperate/separate/Also, shouldn't the reference "see generate_implied_equalities" be
"see generate_implied_equalities_for_column"?
Fixed all of the above. Thank you for catching this!
------
Regards,
Alexander Korotkov
Attachments:
0001-Remove-useless-self-joins-v48.patchapplication/octet-stream; name=0001-Remove-useless-self-joins-v48.patchDownload
From 17c10750387fc2d23108a5baf9bc4952eb2e82f9 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 22 Oct 2023 00:45:09 +0300
Subject: [PATCH] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1211 ++++++++++++++++-
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/utils/misc/guc_tables.c | 10 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 32 +
src/test/regress/expected/join.out | 805 +++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 359 +++++
src/tools/pgindent/typedefs.list | 3 +
14 files changed, 2457 insertions(+), 68 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3839c72c868..bd138236acc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5306,6 +5306,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5c..01e773b91d8 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5f3cce873a0..2e14268e188 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/clauses.h"
#include "optimizer/joininfo.h"
@@ -33,10 +34,45 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s) specified by outerrelids.
+ * 'extra_clauses' contains additional clauses from a baserestrictinfo list that
+ * were used to prove uniqueness. We cache it for the SJ checking procedure: SJ
+ * can be removed if the outer relation contains strictly the same set of
+ * clauses.
+ */
+typedef struct UniqueRelInfo
+{
+ Relids outerrelids;
+ List *extra_clauses;
+} UniqueRelInfo;
+
+/*
+ * The context for replace_varno_walker() containing source and target relids.
+ */
+typedef struct
+{
+ int from;
+ int to;
+} ReplaceVarnoContext;
+
+/*
+ * The struct containing self-join candidate. Used to find duplicate reloids.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -44,14 +80,20 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static void replace_varno(Node *node, int from, int to);
+static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx);
+static int self_join_candidates_cmp(const void *a, const void *b);
+
/*
@@ -89,7 +131,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -308,7 +350,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -320,29 +362,24 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Remove references to the rel from other baserels' attr_needed arrays.
*/
@@ -366,19 +403,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral references. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -392,19 +432,21 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -438,18 +480,47 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -844,9 +915,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +935,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1253,30 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also return additional clauses
+ * from a baserestrictinfo list that were used to prove uniqueness.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1200,10 +1298,14 @@ innerrel_is_unique(PlannerInfo *root,
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if (bms_is_subset(uniqueRelInfo->outerrelids, outerrelids))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1322,7 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist, &outer_exprs))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1335,15 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = palloc(sizeof(UniqueRelInfo));
+ uniqueRelInfo->extra_clauses = outer_exprs;
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1389,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1420,1006 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Replace each occurrence of removing relid with the keeping one
+ */
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ replace_varno_walker(node, &ctx);
+}
+
+/*
+ * Walker function for replace_varno()
+ */
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+ return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ if (newId < 0)
+ /* Delete relid without substitution. */
+ return bms_del_member(relids, oldId);
+
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(relids, oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+static bool
+sje_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ return expression_tree_walker(node, sje_walker, (void *) ctx);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+ ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
+
+ Assert(toKeep->relid != -1);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || equal(rinfo->clause, src->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ query_tree_walker(root->parse, sje_walker, &ctx, QTW_EXAMINE_SORTGROUP);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * Be safe to do not remove tables participated in complicated PH
+ */
+ foreach(lc, root->placeholder_list)
+ {
+ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
+
+ /* there isn't any other place to eval PHV */
+ if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
+ bms_is_subset(phinfo->ph_needed, joinrelids))
+ break;
+ }
+ if (lc)
+ continue;
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins. To avoid complexity, limit the max power of this set by a GUC.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ RangeTblRef *ref = (RangeTblRef *) jlnode;
+ RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+
+ /*
+ * We only care about base relations from which we select
+ * something.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ root->simple_rel_array[ref->rtindex] != NULL)
+ {
+ Assert(!bms_is_member(ref->rtindex, relids));
+ relids = bms_add_member(relids, ref->rtindex);
+ }
+ }
+ else if (IsA(jlnode, List))
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ pg_qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'targetlist' is the top-level targetlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ int nremoved = 0;
+
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fcc0eacd253..be5ef79af39 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -231,6 +231,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 4c585741661..7605eff9b9d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1027,6 +1027,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 7b896d821e8..2d8dbba7cd3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,6 +71,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 31c188176b7..2bc857745a9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -104,6 +105,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..de714410524 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,38 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec1 p
+ -> Materialize
+ -> Seq Scan on ec0 n
+ Filter: (ff IS NOT NULL)
+(6 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b95d30f6586..f59a921cdc6 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6132,6 +6132,811 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+---------------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: ((oid < '10'::oid) AND (oid IS NOT NULL))
+(6 rows)
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+(6 rows)
+
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+----------------------------------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: ((id = ANY ('{2,3}'::integer[])) AND (id IS NOT NULL))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+----------------------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: ((e2.id IS NOT NULL) AND (e2.code <> e2.code))
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------------------------------
+ Aggregate (cost=43.84..43.85 rows=1 width=8)
+ -> Seq Scan on emp1 c3 (cost=0.00..38.25 rows=2237 width=0)
+ Filter: ((id IS NOT NULL) AND ((id * id) IS NOT NULL))
+(3 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj sq
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index aae5d51e1c9..271313ebf86 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -129,10 +129,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(21 rows)
+(22 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1950e6f281f..a73c1f90c4a 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -2499,16 +2499,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31055..77dd964ebf2 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..f8b02601775 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2309,6 +2309,365 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p where exists (select * from sj q where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a
+ and t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select * from sj t1 join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+SELECT * FROM (
+ SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
+ ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
+) AS q1
+LEFT JOIN
+ (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
+ON q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJR corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJR for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+
+explain (costs off) -- Remove SJ, define uniqueness by a constant
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Remove SJ, define uniqueness by a constant expression
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
+; -- Return one row
+
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+
+explain (costs off) -- Shuffle a clause. Remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
+; -- SJs must be removed.
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains expressions in both sides, Remove SJ.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
+; -- Empty set of rows should be returned
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.a)
+; -- Restriction contains volatile function - disable SJR feature.
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3)
+; -- Return one row
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+explain (costs off) -- Remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+explain (costs off) -- Don't remove SJ
+ SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+explain (costs off)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 ( id SERIAL PRIMARY KEY NOT NULL, code int);
+explain (verbose, costs off)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+explain SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+explain (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+explain (COSTS OFF)
+SELECT * FROM
+ (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+--
+---- Only one side is unqiue
+--select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
+--select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
+--
+---- Several uniques indexes match, and we select a different one
+---- for each side, so the join is not removed
+--create table sm(a int unique, b int unique, c int unique);
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.b and m.c = n.c;
+--explain (costs off)
+--select * from sm m, sm n where m.a = n.c and m.b = n.b;
+--explain (costs off)
+--select * from sm m, sm n where m.c = n.b and m.a = n.a;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- TODO: Why this select returns two copies of ctid field? Should we fix it?
+EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+EXPLAIN (COSTS OFF) -- A RowMark exists for the table being kept
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj; -- A RowMark exists for the table being dropped
+DROP RULE sj_del_rule ON sj CASCADE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 06b25617bc9..085d0d7e548 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,6 +367,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2473,6 +2474,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3835,6 +3837,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.3 (Apple Git-145)
Hi Alexander,
23.10.2023 14:29, Alexander Korotkov wrote:
Fixed all of the above. Thank you for catching this!
I've discovered that starting from d3d55ce57 the following query:
CREATE TABLE t(a int PRIMARY KEY);
WITH tt AS (SELECT * FROM t)
UPDATE t SET a = tt.a + 1 FROM tt
WHERE tt.a = t.a RETURNING t.a;
triggers an error "variable not found in subplan target lists".
(Commits 8a8ed916f and b5fb6736e don't fix this, unfortunately.)
Best regards,
Alexander
On 29/12/2023 12:00, Alexander Lakhin wrote:
Hi Alexander,
23.10.2023 14:29, Alexander Korotkov wrote:
Fixed all of the above. Thank you for catching this!
I've discovered that starting from d3d55ce57 the following query:
CREATE TABLE t(a int PRIMARY KEY);WITH tt AS (SELECT * FROM t)
UPDATE t SET a = tt.a + 1 FROM tt
WHERE tt.a = t.a RETURNING t.a;triggers an error "variable not found in subplan target lists".
(Commits 8a8ed916f and b5fb6736e don't fix this, unfortunately.)
Thanks for the report!
The problem is with the resultRelation field. We forget to replace the
relid here.
Could you check your issue with the patch in the attachment? Does it
resolve this case?
--
regards,
Andrei Lepikhov
Postgres Professional
Attachments:
quickfix.difftext/plain; charset=UTF-8; name=quickfix.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6c02fe8908..f79c673afd 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1861,6 +1861,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* Replace varno in all the query structures */
query_tree_walker(root->parse, replace_varno_walker, &ctx,
QTW_EXAMINE_SORTGROUP);
+ if (root->parse->resultRelation == toRemove->relid)
+ root->parse->resultRelation = toKeep->relid;
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@@ -1870,6 +1872,9 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
toRemove->relid, toKeep->relid);
replace_varno((Node *) root->processed_groupClause,
toRemove->relid, toKeep->relid);
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
/*
* There may be references to the rel in root->fkey_list, but if so,
Hi Andrei,
29.12.2023 12:58, Andrei Lepikhov wrote:
Thanks for the report!
The problem is with the resultRelation field. We forget to replace the relid here.
Could you check your issue with the patch in the attachment? Does it resolve this case?
Yes, with the patch applied I see no error.
Thank you!
Best regards,
Alexander
Hello Andrei and Alexander,
Please look at the following query which produces an incorrect result since
d3d55ce57:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);
SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);
a | b
---+---
1 | 1
(1 row)
I think that the expected result is:
a | b
---+---
1 | 1
2 | 1
(2 rows)
Best regards,
Alexander
On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
Please look at the following query which produces an incorrect result since
d3d55ce57:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);
SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);a | b
---+---
1 | 1
(1 row)I think that the expected result is:
a | b
---+---
1 | 1
2 | 1
(2 rows)
Thank you for your report. I'm looking at this now.
------
Regards,
Alexander Korotkov
On Mon, Jan 8, 2024 at 10:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
Please look at the following query which produces an incorrect result since
d3d55ce57:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);
SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);a | b
---+---
1 | 1
(1 row)I think that the expected result is:
a | b
---+---
1 | 1
2 | 1
(2 rows)Thank you for your report. I'm looking at this now.
Fixed in 30b4955a46.
------
Regards,
Alexander Korotkov
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Thank you for fixing that!
I've found another anomaly coined with d3d55ce57. This query:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);
WITH t1 AS (SELECT * FROM t)
UPDATE t SET b = t1.b + 1 FROM t1
WHERE t.a = t1.a RETURNING t.a, t1.b;
gives "ERROR: variable not found in subplan target lists" on d3d55ce57, but
starting from a7928a57b it gives an incorrect result:
a | b
---+---
1 | 2
2 | 2
(2 rows)
Best regards,
Alexander
On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Thank you for fixing that!
I've found another anomaly coined with d3d55ce57. This query:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);WITH t1 AS (SELECT * FROM t)
UPDATE t SET b = t1.b + 1 FROM t1
WHERE t.a = t1.a RETURNING t.a, t1.b;gives "ERROR: variable not found in subplan target lists" on d3d55ce57, but
starting from a7928a57b it gives an incorrect result:
a | b
---+---
1 | 2
2 | 2
(2 rows)
I see. It seems to be not safe to apply SJE to the modify table
target relation because it could use a different snapshot for the
RETURNING clause. I think we should just forbid SJE to involve the
modify table target relation. I'm planning to fix this later today.
------
Regards,
Alexander Korotkov
On Tue, Jan 9, 2024 at 8:08 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Thank you for fixing that!
I've found another anomaly coined with d3d55ce57. This query:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1), (2, 1);WITH t1 AS (SELECT * FROM t)
UPDATE t SET b = t1.b + 1 FROM t1
WHERE t.a = t1.a RETURNING t.a, t1.b;gives "ERROR: variable not found in subplan target lists" on d3d55ce57, but
starting from a7928a57b it gives an incorrect result:
a | b
---+---
1 | 2
2 | 2
(2 rows)I see. It seems to be not safe to apply SJE to the modify table
target relation because it could use a different snapshot for the
RETURNING clause. I think we should just forbid SJE to involve the
modify table target relation. I'm planning to fix this later today.
Fixed in 8c441c08279.
------
Regards,
Alexander Korotkov
Hi Alexander,
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Please look at the following query which fails with an error since
d3d55ce57:
create table t (i int primary key);
select t3.i from t t1
join t t2 on t1.i = t2.i,
lateral (select t1.i limit 1) t3;
ERROR: non-LATERAL parameter required by subquery
Best regards,
Alexander
On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Please look at the following query which fails with an error since
d3d55ce57:create table t (i int primary key);
select t3.i from t t1
join t t2 on t1.i = t2.i,
lateral (select t1.i limit 1) t3;ERROR: non-LATERAL parameter required by subquery
Thank you for spotting. I'm looking at this.
------
Regards,
Alexander Korotkov
On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Please look at the following query which fails with an error since
d3d55ce57:create table t (i int primary key);
select t3.i from t t1
join t t2 on t1.i = t2.i,
lateral (select t1.i limit 1) t3;ERROR: non-LATERAL parameter required by subquery
Thank you for spotting. I'm looking at this.
Attached is a draft patch fixing this query. Could you, please, recheck?
------
Regards,
Alexander Korotkov
Attachments:
0001-Replace-relids-in-lateral-subquery-target-list-du-v1.patchapplication/octet-stream; name=0001-Replace-relids-in-lateral-subquery-target-list-du-v1.patchDownload
From 6eac70fb171b33d23b1afce0363257bbef4c13dd Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 18 Feb 2024 18:15:43 +0200
Subject: [PATCH] Replace relids in lateral subquery target list during SJE
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/CAPpHfdsJ%3DMZNFw0L2T_k-shOk%2B4dPNx9j5RZzicy3NW%2ByiiQVg%40mail.gmail.com
---
src/backend/optimizer/plan/analyzejoins.c | 3 +++
src/test/regress/expected/join.out | 14 ++++++++++++++
src/test/regress/sql/join.sql | 6 ++++++
3 files changed, 23 insertions(+)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index e494acd51a6..88b124d99fc 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -372,6 +372,7 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
RelOptInfo *otherrel = root->simple_rel_array[rti];
+ RangeTblEntry *rte = root->simple_rte_array[rti];
int attroff;
/* there may be empty slots corresponding to non-baserel RTEs */
@@ -396,6 +397,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
/* Update lateral references. */
replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+ if (otherrel->rtekind == RTE_SUBQUERY && rte->lateral)
+ replace_varno((Node *) rte->subquery->targetList, relid, subst);
}
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0c2cba89213..70789eaf462 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6349,6 +6349,20 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+-- Test processing target lists in lateral subqueries
+explain (costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a,
+ lateral (select t1.a limit 1) t3;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ -> Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ -> Limit
+ -> Result
+(5 rows)
+
-- Check updating of Lateral links from top-level query to the removing relation
explain (COSTS OFF)
SELECT * FROM pg_am am WHERE am.amname IN (
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 257f727a2be..c307b500402 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2406,6 +2406,12 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+-- Test processing target lists in lateral subqueries
+explain (costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a,
+ lateral (select t1.a limit 1) t3;
+
-- Check updating of Lateral links from top-level query to the removing relation
explain (COSTS OFF)
SELECT * FROM pg_am am WHERE am.amname IN (
--
2.39.3 (Apple Git-145)
18.02.2024 19:18, Alexander Korotkov wrote:
Attached is a draft patch fixing this query. Could you, please, recheck?
Yes, this patch fixes the behavior for that query (I've also tried several
similar queries). Though I don't know the code well enough to judge the
code change.
Best regards,
Alexander
On 18/2/2024 23:18, Alexander Korotkov wrote:
On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
09.01.2024 01:09, Alexander Korotkov wrote:
Fixed in 30b4955a46.
Please look at the following query which fails with an error since
d3d55ce57:create table t (i int primary key);
select t3.i from t t1
join t t2 on t1.i = t2.i,
lateral (select t1.i limit 1) t3;ERROR: non-LATERAL parameter required by subquery
Thank you for spotting. I'm looking at this.
Attached is a draft patch fixing this query. Could you, please, recheck?
I reviewed this patch. Why do you check only the target list? I guess
these links can be everywhere. See the patch in the attachment with the
elaborated test and slightly changed code.
--
regards,
Andrei Lepikhov
Postgres Professional
Attachments:
0001-Replace-relids-in-lateral-subquery-target-list-du-v2.patchtext/plain; charset=UTF-8; name=0001-Replace-relids-in-lateral-subquery-target-list-du-v2.patchDownload
From 7f94a3c96fd410522b87e570240cdb96b300dd31 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 19 Feb 2024 12:17:55 +0700
Subject: [PATCH] Replace relids in lateral subquery target list during SJE
---
src/backend/optimizer/plan/analyzejoins.c | 29 ++++++++++++++-
src/test/regress/expected/join.out | 44 +++++++++++++++++++++++
src/test/regress/sql/join.sql | 12 +++++++
3 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index e494acd51a..072298f66c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -395,7 +395,34 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
}
/* Update lateral references. */
- replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+ if (root->hasLateralRTEs)
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rti];
+ ReplaceVarnoContext ctx = {.from = relid,.to = subst};
+
+ if (rte->lateral)
+ {
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+
+ /*
+ * Although we pass root->parse through cleanup procedure,
+ * but parse->rtable and rte contains refs to different copies
+ * of the subquery.
+ */
+ if (otherrel->rtekind == RTE_SUBQUERY)
+ query_tree_walker(rte->subquery, replace_varno_walker, &ctx,
+ QTW_EXAMINE_SORTGROUP);
+#ifdef USE_ASSERT_CHECKING
+ /* Just check possibly hidden non-replaced relids */
+ Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->tablesample)));
+ Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->functions)));
+ Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->tablefunc)));
+ Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->values_lists)));
+#endif
+ }
+ }
+
+
}
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0c2cba8921..d560a4a6b9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6349,6 +6349,50 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+-- Test processing target lists in lateral subqueries
+explain (verbose, costs off)
+SELECT t3.a FROM sj t1, sj t2,
+LATERAL (SELECT t1.a WHERE t1.a <> 1
+ GROUP BY (t1.a) HAVING t1.a > 0 ORDER BY t1.a LIMIT 1) t3,
+LATERAL (SELECT t1.a,t3.a WHERE t1.a <> t3.a+t2.a
+ GROUP BY (t3.a) HAVING t1.a > t3.a*t3.a+t2.a/t1.a LIMIT 2) t4,
+LATERAL (SELECT * FROM sj TABLESAMPLE bernoulli(t1.a/t2.a)
+ REPEATABLE (t1.a+t2.a)) t5,
+LATERAL generate_series(1, t1.a + t2.a) AS t6
+WHERE t1.a = t2.a;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Nested Loop
+ Output: t2.a, (t2.a)
+ -> Nested Loop
+ Output: t2.a, (t2.a)
+ -> Nested Loop
+ Output: t2.a, (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Limit
+ Output: (t2.a)
+ -> Group
+ Output: t2.a
+ -> Result
+ One-Time Filter: ((t2.a <> 1) AND (t2.a > 0))
+ -> Limit
+ Output: NULL::integer, ((t2.a))
+ -> Group
+ Output: NULL::integer, (t2.a)
+ -> Result
+ One-Time Filter: ((t2.a <> ((t2.a) + t2.a)) AND (t2.a > (((t2.a) * (t2.a)) + (t2.a / t2.a))))
+ -> Sample Scan on public.sj
+ Output: sj.a, sj.b, sj.c
+ Sampling: bernoulli ((t2.a / t2.a)) REPEATABLE ((t2.a + t2.a))
+ -> Function Scan on pg_catalog.generate_series t6
+ Output: t6.t6
+ Function Call: generate_series(1, (t2.a + t2.a))
+(29 rows)
+
-- Check updating of Lateral links from top-level query to the removing relation
explain (COSTS OFF)
SELECT * FROM pg_am am WHERE am.amname IN (
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 257f727a2b..9341f80bbc 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2406,6 +2406,18 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+-- Test processing target lists in lateral subqueries
+explain (verbose, costs off)
+SELECT t3.a FROM sj t1, sj t2,
+LATERAL (SELECT t1.a WHERE t1.a <> 1
+ GROUP BY (t1.a) HAVING t1.a > 0 ORDER BY t1.a LIMIT 1) t3,
+LATERAL (SELECT t1.a,t3.a WHERE t1.a <> t3.a+t2.a
+ GROUP BY (t3.a) HAVING t1.a > t3.a*t3.a+t2.a/t1.a LIMIT 2) t4,
+LATERAL (SELECT * FROM sj TABLESAMPLE bernoulli(t1.a/t2.a)
+ REPEATABLE (t1.a+t2.a)) t5,
+LATERAL generate_series(1, t1.a + t2.a) AS t6
+WHERE t1.a = t2.a;
+
-- Check updating of Lateral links from top-level query to the removing relation
explain (COSTS OFF)
SELECT * FROM pg_am am WHERE am.amname IN (
--
2.43.0
On Mon, Feb 19, 2024 at 1:24 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru>
wrote:
On 18/2/2024 23:18, Alexander Korotkov wrote:
On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin <exclusion@gmail.com>
wrote:
Please look at the following query which fails with an error since
d3d55ce57:create table t (i int primary key);
select t3.i from t t1
join t t2 on t1.i = t2.i,
lateral (select t1.i limit 1) t3;ERROR: non-LATERAL parameter required by subquery
Thank you for spotting. I'm looking at this.
Attached is a draft patch fixing this query. Could you, please, recheck?
I reviewed this patch. Why do you check only the target list? I guess
these links can be everywhere. See the patch in the attachment with the
elaborated test and slightly changed code.
I just noticed that this fix has been committed in 489072ab7a, but it's
just flat wrong.
* The fix walks the subquery and replaces all the Vars with a varno
equal to the relid of the removing rel, without checking the
varlevelsup. That is to say, a Var that belongs to the subquery itself
might also be replaced, which is wrong. For instance,
create table t (i int primary key);
explain (costs off)
select t3.i from t t1
join t t2 on t1.i = t2.i
join lateral (select * from (select t1.i offset 0) offset 0) t3 on true;
ERROR: no relation entry for relid 2
* The fix only traverses one level within the subquery, so Vars that
appear in subqueries with multiple levels cannot be replaced. For
instance,
explain (costs off)
select t4.i from t t1
join t t2 on t1.i = t2.i
join lateral (select t3.i from t t3, (select t1.i) offset 0) t4 on true;
ERROR: non-LATERAL parameter required by subquery
I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.
Attached is a patch for the fix.
While writing the fix, I noticed some outdated comments. Such as in
remove_rel_from_query, the first for loop updates otherrel's attr_needed
as well as lateral_vars, but the comment only mentions attr_needed. So
this patch also fixes some outdated comments.
While writing the test cases, I found that the test cases for SJE are
quite messy. Below are what I have noticed:
* There are several test cases using catalog tables like pg_class,
pg_stats, pg_index, etc. for testing join removal. I don't see a reason
why we need to use catalog tables, and I think this just raises the risk
of instability.
* In many test cases, a mix of uppercase and lowercase keywords is used
in one query. I think it'd better to maintain consistency by using
either all uppercase or all lowercase keywords in one query.
* In most situations, we verify the plan and the output of a query like:
explain (costs off)
select ...;
select ...;
The two select queries are supposed to be the same. But in the SJE test
cases, I have noticed instances where the two select queries differ from
each other.
This patch also includes some cosmetic tweaks for SJE test cases. It
does not change the test cases using catalog tables though. I think
that should be a seperate patch.
Thanks
Richard
Attachments:
v1-0001-Replace-lateral-references-to-removed-rels-in-subqueries.patchapplication/octet-stream; name=v1-0001-Replace-lateral-references-to-removed-rels-in-subqueries.patchDownload
From b70cb2f48defdf5e2bf66e08bbee122076d829d0 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 21 Feb 2024 14:15:33 +0800
Subject: [PATCH v1] Replace lateral references to removed rels in subqueries
---
src/backend/optimizer/plan/analyzejoins.c | 147 +++++++------
src/test/regress/expected/join.out | 256 +++++++++++-----------
src/test/regress/sql/join.sql | 196 +++++++++--------
3 files changed, 311 insertions(+), 288 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 072298f66c..e5e0df1374 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -34,15 +34,6 @@
#include "optimizer/tlist.h"
#include "utils/lsyscache.h"
-/*
- * The context for replace_varno_walker() containing source and target relids.
- */
-typedef struct
-{
- int from;
- int to;
-} ReplaceVarnoContext;
-
/*
* The struct containing self-join candidate. Used to find duplicate reloids.
*/
@@ -75,13 +66,11 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
List **extra_clauses);
-static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
static void replace_varno(Node *node, int from, int to);
-static bool replace_varno_walker(Node *node, ReplaceVarnoContext *ctx);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
static int self_join_candidates_cmp(const void *a, const void *b);
-
/*
* remove_useless_joins
* Check for relations that don't actually need to be joined at all,
@@ -367,7 +356,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
ListCell *l;
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays and
+ * lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -394,35 +384,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
- /* Update lateral references. */
- if (root->hasLateralRTEs)
- {
- RangeTblEntry *rte = root->simple_rte_array[rti];
- ReplaceVarnoContext ctx = {.from = relid,.to = subst};
-
- if (rte->lateral)
- {
- replace_varno((Node *) otherrel->lateral_vars, relid, subst);
-
- /*
- * Although we pass root->parse through cleanup procedure,
- * but parse->rtable and rte contains refs to different copies
- * of the subquery.
- */
- if (otherrel->rtekind == RTE_SUBQUERY)
- query_tree_walker(rte->subquery, replace_varno_walker, &ctx,
- QTW_EXAMINE_SORTGROUP);
-#ifdef USE_ASSERT_CHECKING
- /* Just check possibly hidden non-replaced relids */
- Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->tablesample)));
- Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->functions)));
- Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->tablefunc)));
- Assert(!bms_is_member(relid, pull_varnos(root, (Node *) rte->values_lists)));
-#endif
- }
- }
-
-
+ /* Update lateral_vars list. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
@@ -1462,35 +1425,32 @@ is_innerrel_unique_for(PlannerInfo *root,
}
/*
- * Replace each occurrence of removing relid with the keeping one
+ * replace_varno - find any Vars, PlaceHolderVar and Relids in the given tree
+ * that reference the removing relid, and change them to reference the
+ * replacement relid.
+ *
+ * NOTE: although this has the form of a walker, we cheat and modify the
+ * nodes in-place.
*/
-static void
-replace_varno(Node *node, int from, int to)
-{
- ReplaceVarnoContext ctx;
-
- if (to <= 0)
- return;
- ctx.from = from;
- ctx.to = to;
- replace_varno_walker(node, &ctx);
-}
+typedef struct
+{
+ int from;
+ int to;
+ int sublevels_up;
+} ReplaceVarnoContext;
-/*
- * Walker function for replace_varno()
- */
static bool
replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
{
if (node == NULL)
return false;
-
if (IsA(node, Var))
{
Var *var = (Var *) node;
- if (var->varno == ctx->from)
+ if (var->varno == ctx->from &&
+ var->varlevelsup == ctx->sublevels_up)
{
var->varno = ctx->to;
var->varnosyn = ctx->to;
@@ -1501,11 +1461,29 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
{
PlaceHolderVar *phv = (PlaceHolderVar *) node;
- phv->phrels = replace_relid(phv->phrels, ctx->from, ctx->to);
- phv->phnullingrels = replace_relid(phv->phnullingrels, ctx->from, ctx->to);
+ if (phv->phlevelsup == ctx->sublevels_up)
+ {
+ phv->phrels =
+ replace_relid(phv->phrels, ctx->from, ctx->to);
+ phv->phnullingrels =
+ replace_relid(phv->phnullingrels, ctx->from, ctx->to);
+ }
/* fall through to recurse into the placeholder's expression */
}
+ else if (IsA(node, Query))
+ {
+ /* Recurse into subselects */
+ bool result;
+
+ ctx->sublevels_up++;
+ result = query_tree_walker((Query *) node,
+ replace_varno_walker,
+ (void *) ctx,
+ QTW_EXAMINE_SORTGROUP);
+ ctx->sublevels_up--;
+ return result;
+ }
else if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
@@ -1517,18 +1495,24 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
{
replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
- rinfo->clause_relids = replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
- rinfo->left_relids = replace_relid(rinfo->left_relids, ctx->from, ctx->to);
- rinfo->right_relids = replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, ctx->from, ctx->to);
}
if (is_req_equal)
rinfo->required_relids = rinfo->clause_relids;
else
- rinfo->required_relids = replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, ctx->from, ctx->to);
- rinfo->outer_relids = replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
- rinfo->incompatible_relids = replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
if (rinfo->mergeopfamilies &&
bms_get_singleton_member(rinfo->clause_relids, &relid) &&
@@ -1556,7 +1540,30 @@ replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
return false;
}
- return expression_tree_walker(node, replace_varno_walker, (void *) ctx);
+
+ return expression_tree_walker(node, replace_varno_walker,
+ (void *) ctx);
+}
+
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ ctx.sublevels_up = 0;
+
+ /*
+ * Must be prepared to start with a Query or a bare expression tree.
+ */
+ query_or_expression_tree_walker(node,
+ replace_varno_walker,
+ (void *) &ctx,
+ QTW_EXAMINE_SORTGROUP);
}
/*
@@ -1748,7 +1755,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
int i;
List *jinfo_candidates = NIL;
List *binfo_candidates = NIL;
- ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
Assert(toKeep->relid != -1);
@@ -1925,8 +1931,7 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
}
/* Replace varno in all the query structures */
- query_tree_walker(root->parse, replace_varno_walker, &ctx,
- QTW_EXAMINE_SORTGROUP);
+ replace_varno((Node *) root->parse, toRemove->relid, toKeep->relid);
/* See remove_self_joins_one_group() */
Assert(root->parse->resultRelation != toRemove->relid);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d560a4a6b9..45f7d1cfc9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6314,13 +6314,13 @@ left join int8_tbl z on y.a = z.q1;
(6 rows)
explain (costs off)
-SELECT * FROM (
- SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
- ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
-) AS q1
-LEFT JOIN
- (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
-ON q1.ax = q2.a;
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c*t1.c = t2.c+2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
QUERY PLAN
---------------------------------------------------------------------------
Nested Loop Left Join
@@ -6349,49 +6349,52 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
--- Test processing target lists in lateral subqueries
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
explain (verbose, costs off)
-SELECT t3.a FROM sj t1, sj t2,
-LATERAL (SELECT t1.a WHERE t1.a <> 1
- GROUP BY (t1.a) HAVING t1.a > 0 ORDER BY t1.a LIMIT 1) t3,
-LATERAL (SELECT t1.a,t3.a WHERE t1.a <> t3.a+t2.a
- GROUP BY (t3.a) HAVING t1.a > t3.a*t3.a+t2.a/t1.a LIMIT 2) t4,
-LATERAL (SELECT * FROM sj TABLESAMPLE bernoulli(t1.a/t2.a)
- REPEATABLE (t1.a+t2.a)) t5,
-LATERAL generate_series(1, t1.a + t2.a) AS t6
-WHERE t1.a = t2.a;
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
Nested Loop
Output: (t2.a)
- -> Nested Loop
- Output: t2.a, (t2.a)
- -> Nested Loop
- Output: t2.a, (t2.a)
- -> Nested Loop
- Output: t2.a, (t2.a)
- -> Seq Scan on public.sj t2
- Output: t2.a, t2.b, t2.c
- Filter: (t2.a IS NOT NULL)
- -> Limit
- Output: (t2.a)
- -> Group
- Output: t2.a
- -> Result
- One-Time Filter: ((t2.a <> 1) AND (t2.a > 0))
- -> Limit
- Output: NULL::integer, ((t2.a))
- -> Group
- Output: NULL::integer, (t2.a)
- -> Result
- One-Time Filter: ((t2.a <> ((t2.a) + t2.a)) AND (t2.a > (((t2.a) * (t2.a)) + (t2.a / t2.a))))
- -> Sample Scan on public.sj
- Output: sj.a, sj.b, sj.c
- Sampling: bernoulli ((t2.a / t2.a)) REPEATABLE ((t2.a + t2.a))
- -> Function Scan on pg_catalog.generate_series t6
- Output: t6.t6
- Function Call: generate_series(1, (t2.a + t2.a))
-(29 rows)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
-- Check updating of Lateral links from top-level query to the removing relation
explain (COSTS OFF)
@@ -6412,13 +6415,14 @@ SELECT * FROM pg_am am WHERE am.amname IN (
(6 rows)
--
--- SJR corner case: uniqueness of an inner is [partially] derived from
+-- SJE corner case: uniqueness of an inner is [partially] derived from
-- baserestrictinfo clauses.
-- XXX: We really should allow SJR for these corner cases?
--
INSERT INTO sj VALUES (3, 1, 3);
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
QUERY PLAN
------------------------------
Nested Loop
@@ -6429,92 +6433,87 @@ explain (costs off) -- Don't remove SJ
Filter: (a = 3)
(6 rows)
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
a | b | c | a | b | c
---+---+---+---+---+---
2 | 1 | 1 | 3 | 1 | 3
(1 row)
-explain (costs off) -- Remove SJ, define uniqueness by a constant
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
QUERY PLAN
-----------------------------------------
Seq Scan on sj j2
Filter: ((b IS NOT NULL) AND (a = 2))
(2 rows)
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
a | b | c | a | b | c
---+---+---+---+---+---
2 | 1 | 1 | 2 | 1 | 1
(1 row)
+-- Remove SJ, define uniqueness by a constant expression
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
-; -- Remove SJ, define uniqueness by a constant expression
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on sj j2
Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
(2 rows)
+-- Return one row
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
-; -- Return one row
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
a | b | c | a | b | c
---+---+---+---+---+---
3 | 1 | 3 | 3 | 1 | 3
(1 row)
-explain (costs off) -- Remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
QUERY PLAN
-----------------------------------------
Seq Scan on sj j2
Filter: ((b IS NOT NULL) AND (a = 1))
(2 rows)
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
a | b | c | a | b | c
---+---+---+---+---+---
(0 rows)
-explain (costs off) -- Shuffle a clause. Remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
QUERY PLAN
-----------------------------------------
Seq Scan on sj j2
Filter: ((b IS NOT NULL) AND (a = 1))
(2 rows)
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
a | b | c | a | b | c
---+---+---+---+---+---
(0 rows)
-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
-- after SJ elimination it shouldn't be a mergejoinable clause.
-SELECT t4.*
-FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
-JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
- a | b | c
----+---+---
-(0 rows)
-
EXPLAIN (COSTS OFF)
SELECT t4.*
FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
-JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
-; -- SJs must be removed.
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
QUERY PLAN
---------------------------------
Nested Loop
@@ -6525,10 +6524,18 @@ JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
Filter: (a IS NOT NULL)
(6 rows)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
-- Functional index
CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
-explain (costs off) -- Remove SJ
- SELECT * FROM sj j1, sj j2
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
QUERY PLAN
-----------------------------------------------
@@ -6536,8 +6543,9 @@ explain (costs off) -- Remove SJ
Filter: ((b IS NOT NULL) AND ((a * a) = 1))
(2 rows)
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
QUERY PLAN
-------------------------------
@@ -6549,48 +6557,48 @@ explain (costs off) -- Don't remove SJ
Filter: ((a * a) = 2)
(6 rows)
+-- Restriction contains expressions in both sides, Remove SJ.
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
-; -- Restriction contains expressions in both sides, Remove SJ.
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sj j2
Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
(2 rows)
+-- Empty set of rows should be returned
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
-; -- Empty set of rows should be returned
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
a | b | c | a | b | c
---+---+---+---+---+---
(0 rows)
+-- Restriction contains volatile function - disable SJR feature.
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
- AND (j1.a*j1.a) = (random()/3 + 3)::int
- AND (random()/3 + 3)::int = (j2.a*j2.a)
-; -- Restriction contains volatile function - disable SJR feature.
- QUERY PLAN
------------------------------------------------------------------------------------------------------
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
Nested Loop
Join Filter: (j1.b = j2.b)
-> Seq Scan on sj j1
- Filter: ((a * a) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
-> Seq Scan on sj j2
- Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = (a * a))
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
(6 rows)
+-- Return one row
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.c/3) = (random()/3 + 3)::int
- AND (random()/3 + 3)::int = (j2.a*j2.c/3)
-; -- Return one row
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
a | b | c | a | b | c
---+---+---+---+---+---
3 | 1 | 3 | 3 | 1 | 3
@@ -6598,8 +6606,9 @@ WHERE j1.b = j2.b
-- Multiple filters
CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
-explain (costs off) -- Remove SJ
- SELECT * FROM sj j1, sj j2
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
QUERY PLAN
-----------------------------------------------------
@@ -6607,7 +6616,8 @@ explain (costs off) -- Remove SJ
Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
(2 rows)
-explain (costs off) -- Don't remove SJ
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
QUERY PLAN
@@ -6621,8 +6631,9 @@ explain (costs off) -- Don't remove SJ
(6 rows)
CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
QUERY PLAN
------------------------------
Nested Loop
@@ -6632,8 +6643,9 @@ explain (costs off) -- Don't remove SJ
-> Seq Scan on sj j2
(5 rows)
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
QUERY PLAN
------------------------------
Nested Loop
@@ -6643,8 +6655,9 @@ explain (costs off) -- Don't remove SJ
-> Seq Scan on sj j1
(5 rows)
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
QUERY PLAN
---------------------------------------------------------------
Nested Loop
@@ -6658,7 +6671,7 @@ DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
-- Test that OR predicated are updated correctly after join removal
CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
-explain (costs off)
+EXPLAIN (COSTS OFF)
SELECT COUNT(*) FROM tab_with_flag
WHERE
(is_flag IS NULL OR is_flag = 0)
@@ -6787,7 +6800,7 @@ reset join_collapse_limit;
reset enable_seqscan;
-- Check that clauses from the join filter list is not lost on the self-join removal
CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
-explain (verbose, costs off)
+EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
QUERY PLAN
------------------------------------------
@@ -6799,7 +6812,7 @@ SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
-- Shuffle self-joined relations. Only in the case of iterative deletion
-- attempts explains of these queries will be identical.
CREATE UNIQUE INDEX ON emp1((id*id));
-explain (costs off)
+EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
QUERY PLAN
@@ -6809,7 +6822,7 @@ WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
Filter: ((id * id) IS NOT NULL)
(3 rows)
-explain (costs off)
+EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
QUERY PLAN
@@ -6819,7 +6832,7 @@ WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
Filter: ((id * id) IS NOT NULL)
(3 rows)
-explain (costs off)
+EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
QUERY PLAN
@@ -6830,7 +6843,7 @@ WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
(3 rows)
-- Check the usage of a parse tree by the set operations (bug #18170)
-explain (costs off)
+EXPLAIN (COSTS OFF)
SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
WHERE c2.id IS NOT NULL
EXCEPT ALL
@@ -6928,7 +6941,7 @@ select * from emp1 t1
(5 rows)
-- Check that SJE doesn't replace the target relation
-explain (costs off)
+EXPLAIN (COSTS OFF)
WITH t1 AS (SELECT * FROM emp1)
UPDATE emp1 SET code = t1.code + 1 FROM t1
WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
@@ -7077,7 +7090,7 @@ where t1.b = t2.b and t2.a = 3 and t1.a = 3
(3 rows)
-- Join qual isn't mergejoinable, but inner is unique.
-explain (COSTS OFF)
+EXPLAIN (COSTS OFF)
SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
QUERY PLAN
-------------------------------
@@ -7088,9 +7101,9 @@ SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
-> Seq Scan on sj n1
(5 rows)
-explain (COSTS OFF)
+EXPLAIN (COSTS OFF)
SELECT * FROM
- (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
WHERE q0.a = 1;
QUERY PLAN
-------------------------------
@@ -7103,20 +7116,6 @@ WHERE q0.a = 1;
-> Seq Scan on sj n1
(7 rows)
---
----- Only one side is unique
---select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
---select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
---
----- Several uniques indexes match, and we select a different one
----- for each side, so the join is not removed
---create table sm(a int unique, b int unique, c int unique);
---explain (costs off)
---select * from sm m, sm n where m.a = n.b and m.c = n.c;
---explain (costs off)
---select * from sm m, sm n where m.a = n.c and m.b = n.b;
---explain (costs off)
---select * from sm m, sm n where m.c = n.b and m.a = n.a;
-- Check optimization disabling if it will violate special join conditions.
-- Two identical joined relations satisfies self join removal conditions but
-- stay in different special join infos.
@@ -7175,7 +7174,8 @@ ON sj_t1.id = _t2t3t4.id;
--
-- Test RowMarks-related code
--
-EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
QUERY PLAN
---------------------------------
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 9341f80bbc..e6ecfe36af 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2390,13 +2390,13 @@ select * from sj x join sj y on x.a = y.a
left join int8_tbl z on y.a = z.q1;
explain (costs off)
-SELECT * FROM (
- SELECT t1.*, t2.a AS ax FROM sj t1 JOIN sj t2
- ON (t1.a = t2.a AND t1.c*t1.c = t2.c+2 AND t2.b IS NULL)
-) AS q1
-LEFT JOIN
- (SELECT t3.* FROM sj t3, sj t4 WHERE t3.c = t4.c) AS q2
-ON q1.ax = q2.a;
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c*t1.c = t2.c+2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
-- Test that placeholders are updated correctly after join removal
explain (costs off)
@@ -2406,17 +2406,22 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
--- Test processing target lists in lateral subqueries
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
explain (verbose, costs off)
-SELECT t3.a FROM sj t1, sj t2,
-LATERAL (SELECT t1.a WHERE t1.a <> 1
- GROUP BY (t1.a) HAVING t1.a > 0 ORDER BY t1.a LIMIT 1) t3,
-LATERAL (SELECT t1.a,t3.a WHERE t1.a <> t3.a+t2.a
- GROUP BY (t3.a) HAVING t1.a > t3.a*t3.a+t2.a/t1.a LIMIT 2) t4,
-LATERAL (SELECT * FROM sj TABLESAMPLE bernoulli(t1.a/t2.a)
- REPEATABLE (t1.a+t2.a)) t5,
-LATERAL generate_series(1, t1.a + t2.a) AS t6
-WHERE t1.a = t2.a;
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
-- Check updating of Lateral links from top-level query to the removing relation
explain (COSTS OFF)
@@ -2428,109 +2433,129 @@ SELECT * FROM pg_am am WHERE am.amname IN (
);
--
--- SJR corner case: uniqueness of an inner is [partially] derived from
+-- SJE corner case: uniqueness of an inner is [partially] derived from
-- baserestrictinfo clauses.
-- XXX: We really should allow SJR for these corner cases?
--
INSERT INTO sj VALUES (3, 1, 3);
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3; -- Return one row
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
-explain (costs off) -- Remove SJ, define uniqueness by a constant
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2; -- Return one row
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Remove SJ, define uniqueness by a constant expression
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
-; -- Remove SJ, define uniqueness by a constant expression
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a
-; -- Return one row
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
-explain (costs off) -- Remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1; -- Return no rows
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
-explain (costs off) -- Shuffle a clause. Remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
-SELECT * FROM sj j1, sj j2
-WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1; -- Return no rows
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
SELECT t4.*
FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
-EXPLAIN (COSTS OFF)
SELECT t4.*
FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
-JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42
-; -- SJs must be removed.
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
-- Functional index
CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
-explain (costs off) -- Remove SJ
- SELECT * FROM sj j1, sj j2
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
-; -- Restriction contains expressions in both sides, Remove SJ.
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
- AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a)
-; -- Empty set of rows should be returned
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJR feature.
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
- AND (j1.a*j1.a) = (random()/3 + 3)::int
- AND (random()/3 + 3)::int = (j2.a*j2.a)
-; -- Restriction contains volatile function - disable SJR feature.
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.c/3) = (random()/3 + 3)::int
- AND (random()/3 + 3)::int = (j2.a*j2.c/3)
-; -- Return one row
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
-- Multiple filters
CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
-explain (costs off) -- Remove SJ
- SELECT * FROM sj j1, sj j2
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
-explain (costs off) -- Don't remove SJ
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
-explain (costs off) -- Don't remove SJ
- SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
-- Test that OR predicated are updated correctly after join removal
CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
-explain (costs off)
+
+EXPLAIN (COSTS OFF)
SELECT COUNT(*) FROM tab_with_flag
WHERE
(is_flag IS NULL OR is_flag = 0)
@@ -2581,24 +2606,27 @@ reset enable_seqscan;
-- Check that clauses from the join filter list is not lost on the self-join removal
CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
-explain (verbose, costs off)
+EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
-- Shuffle self-joined relations. Only in the case of iterative deletion
-- attempts explains of these queries will be identical.
CREATE UNIQUE INDEX ON emp1((id*id));
-explain (costs off)
+
+EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
-explain (costs off)
+
+EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
-explain (costs off)
+
+EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
-- Check the usage of a parse tree by the set operations (bug #18170)
-explain (costs off)
+EXPLAIN (COSTS OFF)
SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
WHERE c2.id IS NOT NULL
EXCEPT ALL
@@ -2637,14 +2665,17 @@ select * from emp1 t1
left join emp1 t3 on t1.id > 1 and t1.id < 2;
-- Check that SJE doesn't replace the target relation
-explain (costs off)
+EXPLAIN (COSTS OFF)
WITH t1 AS (SELECT * FROM emp1)
UPDATE emp1 SET code = t1.code + 1 FROM t1
WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
WITH t1 AS (SELECT * FROM emp1)
UPDATE emp1 SET code = t1.code + 1 FROM t1
WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
TRUNCATE emp1;
EXPLAIN (COSTS OFF)
@@ -2707,28 +2738,14 @@ where t1.b = t2.b and t2.a = 3 and t1.a = 3
and t1.a IS NOT NULL and t2.a IS NOT NULL;
-- Join qual isn't mergejoinable, but inner is unique.
-explain (COSTS OFF)
+EXPLAIN (COSTS OFF)
SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
-explain (COSTS OFF)
+
+EXPLAIN (COSTS OFF)
SELECT * FROM
- (SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
WHERE q0.a = 1;
---
----- Only one side is unique
---select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1;
---select * from sl t1, sl t2 where t1.a = t2.a and t2.b = 1;
---
----- Several uniques indexes match, and we select a different one
----- for each side, so the join is not removed
---create table sm(a int unique, b int unique, c int unique);
---explain (costs off)
---select * from sm m, sm n where m.a = n.b and m.c = n.c;
---explain (costs off)
---select * from sm m, sm n where m.a = n.c and m.b = n.b;
---explain (costs off)
---select * from sm m, sm n where m.c = n.b and m.a = n.a;
-
-- Check optimization disabling if it will violate special join conditions.
-- Two identical joined relations satisfies self join removal conditions but
-- stay in different special join infos.
@@ -2763,7 +2780,8 @@ ON sj_t1.id = _t2t3t4.id;
-- Test RowMarks-related code
--
-EXPLAIN (COSTS OFF) -- Both sides have explicit LockRows marks
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
reset enable_hashjoin;
--
2.31.0
On 21/2/2024 14:26, Richard Guo wrote:
This patch also includes some cosmetic tweaks for SJE test cases. It
does not change the test cases using catalog tables though. I think
that should be a seperate patch.
Thanks for this catch, it is really messy thing, keeping aside the
question why we need two different subtrees for the same query.
I will look into your fix.
--
regards,
Andrei Lepikhov
Postgres Professional
On 21/2/2024 14:26, Richard Guo wrote:
I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.
This code looks good. No idea how we have lost it before.
While writing the fix, I noticed some outdated comments. Such as in
remove_rel_from_query, the first for loop updates otherrel's attr_needed
as well as lateral_vars, but the comment only mentions attr_needed. So
this patch also fixes some outdated comments.
Thanks, looks good.
While writing the test cases, I found that the test cases for SJE are
quite messy. Below are what I have noticed:* There are several test cases using catalog tables like pg_class,
pg_stats, pg_index, etc. for testing join removal. I don't see a reason
why we need to use catalog tables, and I think this just raises the risk
of instability.
I see only one unusual query with the pg_class involved.
* In many test cases, a mix of uppercase and lowercase keywords is used
in one query. I think it'd better to maintain consistency by using
either all uppercase or all lowercase keywords in one query.
I see uppercase -> lowercase change:
select t1.*, t2.a as ax from sj t1 join sj t2
and lowercase -> uppercase in many other cases:
explain (costs off)
I guess it is a matter of taste, so give up for the committer decision.
Technically, it's OK.
* In most situations, we verify the plan and the output of a query like:
explain (costs off)
select ...;
select ...;The two select queries are supposed to be the same. But in the SJE test
cases, I have noticed instances where the two select queries differ from
each other.This patch also includes some cosmetic tweaks for SJE test cases. It
does not change the test cases using catalog tables though. I think
that should be a seperate patch.
I can't assess the necessity of changing these dozens of lines of code
because I follow another commenting style, but technically, it's still OK.
--
regards,
Andrei Lepikhov
Postgres Professional
On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 21/2/2024 14:26, Richard Guo wrote:
I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.This code looks good. No idea how we have lost it before.
Thanks to Richard for the patch and to Andrei for review. I also find
code looking good. Pushed with minor edits from me.
------
Regards,
Alexander Korotkov
On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 21/2/2024 14:26, Richard Guo wrote:
I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.This code looks good. No idea how we have lost it before.
Thanks to Richard for the patch and to Andrei for review. I also find
code looking good. Pushed with minor edits from me.
I feel this, commit 466979e, misses a few of our project standards:
- The patch makes many non-whitespace changes to existing test queries. This
makes it hard to review the consequences of the non-test part of the patch.
Did you minimize such edits? Of course, not every such edit is avoidable.
- The commit message doesn't convey whether this is refactoring or is a bug
fix. This makes it hard to write release notes, among other things. From
this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
v17-specific. The commit message for 489072ab7a is also silent about that
commit's status as refactoring or as a bug fix.
- Normally, I could answer the previous question by reading the test case
diffs. However, in addition to the first point about non-whitespace
changes, the first three join.sql patch hunks just change whitespace.
Worse, since they move line breaks, "git diff -w" doesn't filter them out.
To what extent are those community standards vs. points of individual
committer preference? Please tell me where I'm wrong here.
Hi, Noah!
On Sat, Feb 24, 2024 at 7:12 AM Noah Misch <noah@leadboat.com> wrote:
On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 21/2/2024 14:26, Richard Guo wrote:
I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.This code looks good. No idea how we have lost it before.
Thanks to Richard for the patch and to Andrei for review. I also find
code looking good. Pushed with minor edits from me.I feel this, commit 466979e, misses a few of our project standards:
- The patch makes many non-whitespace changes to existing test queries. This
makes it hard to review the consequences of the non-test part of the patch.
Did you minimize such edits? Of course, not every such edit is avoidable.- The commit message doesn't convey whether this is refactoring or is a bug
fix. This makes it hard to write release notes, among other things. From
this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
v17-specific. The commit message for 489072ab7a is also silent about that
commit's status as refactoring or as a bug fix.- Normally, I could answer the previous question by reading the test case
diffs. However, in addition to the first point about non-whitespace
changes, the first three join.sql patch hunks just change whitespace.
Worse, since they move line breaks, "git diff -w" doesn't filter them out.To what extent are those community standards vs. points of individual
committer preference? Please tell me where I'm wrong here.
I agree that commit 466979e is my individual committer failure. I
should have written a better, more clear commit message and separate
tests refactoring from the bug fix.
I'm not so sure about 489072ab7a (except it provides a wrong fix). It
has a "Reported-by:" field meaning it's a problem reported by a
particular person. The "Discussion:" points directly to the reported
test case. And commit contains the relevant test case. The commit
message could be more wordy though.
------
Regards,
Alexander Korotkov
Hello,
On Sat, Feb 24, 2024 at 01:02:01PM +0200, Alexander Korotkov wrote:
On Sat, Feb 24, 2024 at 7:12 AM Noah Misch <noah@leadboat.com> wrote:
On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 21/2/2024 14:26, Richard Guo wrote:
I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.This code looks good. No idea how we have lost it before.
Thanks to Richard for the patch and to Andrei for review. I also find
code looking good. Pushed with minor edits from me.I feel this, commit 466979e, misses a few of our project standards:
- The patch makes many non-whitespace changes to existing test queries. This
makes it hard to review the consequences of the non-test part of the patch.
Did you minimize such edits? Of course, not every such edit is avoidable.- The commit message doesn't convey whether this is refactoring or is a bug
fix. This makes it hard to write release notes, among other things. From
this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
v17-specific. The commit message for 489072ab7a is also silent about that
commit's status as refactoring or as a bug fix.- Normally, I could answer the previous question by reading the test case
diffs. However, in addition to the first point about non-whitespace
changes, the first three join.sql patch hunks just change whitespace.
Worse, since they move line breaks, "git diff -w" doesn't filter them out.To what extent are those community standards vs. points of individual
committer preference? Please tell me where I'm wrong here.I agree that commit 466979e is my individual committer failure. I
should have written a better, more clear commit message and separate
tests refactoring from the bug fix.I'm not so sure about 489072ab7a (except it provides a wrong fix). It
has a "Reported-by:" field meaning it's a problem reported by a
particular person. The "Discussion:" points directly to the reported
test case. And commit contains the relevant test case. The commit
message could be more wordy though.
Agreed, the first and third points don't apply to 489072ab7a. Thanks to that,
one can deduce from its new test case query that it fixes a bug. It sounds
like we agree about commit 466979e, so that's good.
Hello Alexander,
23.10.2023 12:47, Alexander Korotkov wrote:
I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.
I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
With asserts enabled, it triggers
TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: "initsplan.c", Line: 697, PID: 3074054
ExceptionalCondition at assert.c:52:13
create_lateral_join_info at initsplan.c:700:8
query_planner at planmain.c:257:2
grouping_planner at planner.c:1523:17
subquery_planner at planner.c:1098:2
standard_planner at planner.c:415:9
planner at planner.c:282:12
pg_plan_query at postgres.c:904:9
pg_plan_queries at postgres.c:996:11
exec_simple_query at postgres.c:1193:19
PostgresMain at postgres.c:4684:27
With no asserts, I get:
ERROR: failed to construct the join relation
Please take a look at this.
Best regards,
Alexander
Hi, Alexander!
On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
23.10.2023 12:47, Alexander Korotkov wrote:
I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;With asserts enabled, it triggers
TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: "initsplan.c", Line: 697, PID: 3074054
ExceptionalCondition at assert.c:52:13
create_lateral_join_info at initsplan.c:700:8
query_planner at planmain.c:257:2
grouping_planner at planner.c:1523:17
subquery_planner at planner.c:1098:2
standard_planner at planner.c:415:9
planner at planner.c:282:12
pg_plan_query at postgres.c:904:9
pg_plan_queries at postgres.c:996:11
exec_simple_query at postgres.c:1193:19
PostgresMain at postgres.c:4684:27With no asserts, I get:
ERROR: failed to construct the join relationPlease take a look at this.
I'm looking into this, thank you!
------
Regards,
Alexander Korotkov
On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
23.10.2023 12:47, Alexander Korotkov wrote:
I think this patch makes substantial improvement to query planning.
It has received plenty of reviews. The code is currently in quite
good shape. I didn't manage to find the cases when this optimization
causes significant overhead to planning time. Even if such cases will
be spotted there is a GUC option to disable this feature. So, I'll
push this if there are no objections.I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars. The draft patch for this is
attached. I'd like to ask Alexander to test it, Richard and Andrei to
review it. Thank you!
------
Regards,
Alexander Korotkov
Attachments:
sje_skip_cross_lateral_vars.patchapplication/octet-stream; name=sje_skip_cross_lateral_vars.patchDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c9..3304b0ea547 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -29,6 +29,7 @@
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
+#include "optimizer/placeholder.h"
#include "optimizer/restrictinfo.h"
#include "utils/lsyscache.h"
@@ -2101,6 +2102,36 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
return true;
}
+static bool
+check_lateral_vars(PlannerInfo *root, RelOptInfo *inner, RelOptInfo *outer)
+{
+ ListCell *lc;
+
+ foreach(lc, inner->lateral_vars)
+ {
+ Node *node = (Node *) lfirst(lc);
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == outer->relid)
+ return false;
+ }
+ else if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+ PlaceHolderInfo *phinfo = find_placeholder_info(root, phv);
+
+ if (bms_is_member(outer->relid, phinfo->ph_eval_at))
+ return false;
+ }
+ else
+ Assert(false);
+ }
+ return true;
+}
+
/*
* Find and remove unique self joins in a group of base relations that have
* the same Oid.
@@ -2255,6 +2286,10 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
continue;
+ if (!check_lateral_vars(root, outer, inner) ||
+ !check_lateral_vars(root, inner, outer))
+ continue;
+
/*
* We can remove either relation, so remove the inner one in order
* to simplify this loop.
30.04.2024 13:20, Alexander Korotkov wrote:
On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars. The draft patch for this is
attached. I'd like to ask Alexander to test it, Richard and Andrei to
review it. Thank you!
Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
in general. For instance:
CREATE TABLE t (a int unique);
INSERT INTO t SELECT * FROM generate_series (1,100);
SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
returned 100, 100, 100 for me, though with enable_self_join_removal = off,
I got 4, 0, 1...
Best regards,
Alexander
On Wed, May 1, 2024 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
30.04.2024 13:20, Alexander Korotkov wrote:
On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars. The draft patch for this is
attached. I'd like to ask Alexander to test it, Richard and Andrei to
review it. Thank you!Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
in general. For instance:
CREATE TABLE t (a int unique);
INSERT INTO t SELECT * FROM generate_series (1,100);SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
returned 100, 100, 100 for me, though with enable_self_join_removal = off,
I got 4, 0, 1...
Right, thank you for reporting this.
BTW, I found another case where my previous fix doesn't work.
SELECT * FROM t NATURAL JOIN LATERAL (SELECT * FROM t t2 TABLESAMPLE
SYSTEM (t.b) NATURAL JOIN LATERAL(SELECT * FROM t t3 TABLESAMPLE
SYSTEM (t2.b)) t3) t2;
I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether. Please, check the attached patch.
------
Regards,
Alexander Korotkov
Attachments:
sje_tablesample.patchapplication/octet-stream; name=sje_tablesample.patchDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c9..fe3c9bc9b72 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2148,6 +2148,10 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
Assert(root->simple_rte_array[k]->relid ==
root->simple_rte_array[r]->relid);
+ if (root->simple_rte_array[k]->tablesample ||
+ root->simple_rte_array[r]->tablesample)
+ continue;
+
/*
* It is impossible to eliminate join of two relations if they
* belong to different rules of order. Otherwise planner can't be
On 5/1/24 18:59, Alexander Korotkov wrote:
I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether. Please, check the attached patch.
Your patch looks good to me. I added some comments and test case into
the join.sql.
One question for me is: Do we anticipate other lateral self-references
except the TABLESAMPLE case? Looking into the extract_lateral_references
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
pull up subqueries before extracting lateral references. So, if we have
a reference to a subquery, it means we will not flatten this subquery
and don't execute SJE. Do we need more code, as you have written in the
first patch?
--
regards,
Andrei Lepikhov
Postgres Professional
Attachments:
0001-Forbid-self-join-elimination-on-table-sampling-scans.patchtext/x-patch; charset=UTF-8; name=0001-Forbid-self-join-elimination-on-table-sampling-scans.patchDownload
From dac8afd2095586921dfcf5564e7f2979e89b77de Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Thu, 2 May 2024 16:17:52 +0700
Subject: [PATCH] Forbid self-join elimination on table sampling scans.
Having custom table sampling methods we can stuck into unpredictable issues
replacing join with scan operation. It may had sense to analyse possible
situations and enable SJE, but the real profit from this operation looks
too low.
---
src/backend/optimizer/plan/analyzejoins.c | 8 ++++++++
src/backend/optimizer/plan/initsplan.c | 5 ++++-
src/test/regress/expected/join.out | 13 +++++++++++++
src/test/regress/sql/join.sql | 5 +++++
4 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c..bb89558dcd 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2148,6 +2148,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
Assert(root->simple_rte_array[k]->relid ==
root->simple_rte_array[r]->relid);
+ /*
+ * To avoid corner cases with table sampling methods just forbid
+ * SJE for table sampling entries.
+ */
+ if (root->simple_rte_array[k]->tablesample ||
+ root->simple_rte_array[r]->tablesample)
+ continue;
+
/*
* It is impossible to eliminate join of two relations if they
* belong to different rules of order. Otherwise planner can't be
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..bf839bcaf6 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -415,7 +415,10 @@ extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex)
if (!rte->lateral)
return;
- /* Fetch the appropriate variables */
+ /* Fetch the appropriate variables.
+ * Changes in this place may need changes in SJE logic, see
+ * the remove_self_joins_one_group routine.
+ */
if (rte->rtekind == RTE_RELATION)
vars = pull_vars_of_level((Node *) rte->tablesample, 0);
else if (rte->rtekind == RTE_SUBQUERY)
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8b640c2fc2..63143fe55f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6900,6 +6900,19 @@ where s1.x = 1;
Filter: (1 = 1)
(9 rows)
+-- Check that SJE doesn't touch TABLESAMPLE joins
+explain (costs off)
+SELECT * FROM emp1 t1 NATURAL JOIN LATERAL
+ (SELECT * FROM emp1 t2 TABLESAMPLE SYSTEM (t1.code));
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Sample Scan on emp1 t2
+ Sampling: system (t1.code)
+ Filter: ((t1.id = id) AND (t1.code = code))
+(5 rows)
+
-- Check that PHVs do not impose any constraints on removing self joins
explain (verbose, costs off)
select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c4c6c7b8ba..184fd0876b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2652,6 +2652,11 @@ select 1 from emp1 t1 left join
on true
where s1.x = 1;
+-- Check that SJE doesn't touch TABLESAMPLE joins
+explain (costs off)
+SELECT * FROM emp1 t1 NATURAL JOIN LATERAL
+ (SELECT * FROM emp1 t2 TABLESAMPLE SYSTEM (t1.code));
+
-- Check that PHVs do not impose any constraints on removing self joins
explain (verbose, costs off)
select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
--
2.39.2
On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 5/1/24 18:59, Alexander Korotkov wrote:
I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether. Please, check the attached patch.Your patch looks good to me. I added some comments and test case into
the join.sql.
Thank you
One question for me is: Do we anticipate other lateral self-references
except the TABLESAMPLE case? Looking into the extract_lateral_references
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
pull up subqueries before extracting lateral references. So, if we have
a reference to a subquery, it means we will not flatten this subquery
and don't execute SJE. Do we need more code, as you have written in the
first patch?
I think my first patch was crap anyway. Your explanation seems
reasonable to me. I'm not sure this requires any more code. Probably
it would be enough to add more comments about this.
------
Regards,
Alexander Korotkov
On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:One question for me is: Do we anticipate other lateral self-references
except the TABLESAMPLE case? Looking into the extract_lateral_references
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
pull up subqueries before extracting lateral references. So, if we have
a reference to a subquery, it means we will not flatten this subquery
and don't execute SJE. Do we need more code, as you have written in the
first patch?I think my first patch was crap anyway. Your explanation seems
reasonable to me. I'm not sure this requires any more code. Probably
it would be enough to add more comments about this.
The tablesample case is not the only factor that can cause a relation to
have a lateral dependency on itself after self-join removal. It can
also happen with PHVs. As an example, consider
explain (costs off)
select * from t t1
left join lateral
(select t1.a as t1a, * from t t2) t2
on true
where t1.a = t2.a;
server closed the connection unexpectedly
This is because after self-join removal, a PlaceHolderInfo's ph_lateral
might contain rels mentioned in ph_eval_at, which we should get rid of.
For the tablesample case, I agree that we should not consider relations
with TABLESAMPLE clauses as candidates to be removed. Removing such a
relation could potentially change the syntax of the query, as shown by
Alexander's example. It seems to me that we can just check that in
remove_self_joins_recurse, while we're collecting the base relations
that are considered to be candidates for removal.
This leads to the attached patch. This patch also includes some code
refactoring for the surrounding code.
Thanks
Richard
Attachments:
v1-0001-Fix-bogus-lateral-dependency-after-self-join-removal.patchapplication/octet-stream; name=v1-0001-Fix-bogus-lateral-dependency-after-self-join-removal.patchDownload
From a944899866fdf001475a819fd5e8f3e94c4aeb4c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 2 May 2024 13:07:21 +0000
Subject: [PATCH v1] Fix bogus lateral dependency after self-join removal
---
src/backend/optimizer/plan/analyzejoins.c | 28 +++++++++++++++-----
src/test/regress/expected/join.out | 32 +++++++++++++++++++++++
src/test/regress/sql/join.sql | 16 ++++++++++++
3 files changed, 69 insertions(+), 7 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c..76f1dffb43 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -457,15 +457,25 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
+
phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
+
phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
/* ph_lateral might or might not be empty */
+
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
replace_varno((Node *) phv->phexpr, relid, subst);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
@@ -2292,25 +2302,29 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
if (IsA(jlnode, RangeTblRef))
{
- RangeTblRef *ref = (RangeTblRef *) jlnode;
- RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
/*
- * We only care about base relations from which we select
- * something.
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
*/
if (rte->rtekind == RTE_RELATION &&
rte->relkind == RELKIND_RELATION &&
- root->simple_rel_array[ref->rtindex] != NULL)
+ rte->tablesample == NULL)
{
- Assert(!bms_is_member(ref->rtindex, relids));
- relids = bms_add_member(relids, ref->rtindex);
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
}
}
else if (IsA(jlnode, List))
+ {
/* Recursively go inside the sub-joinlist */
toRemove = remove_self_joins_recurse(root, (List *) jlnode,
toRemove);
+ }
else
elog(ERROR, "unrecognized joinlist node type: %d",
(int) nodeTag(jlnode));
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8b640c2fc2..24ad31a3ee 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6215,6 +6215,38 @@ select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
Filter: (b IS NOT NULL)
(6 rows)
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
-- Degenerated case.
explain (costs off)
select * from
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c4c6c7b8ba..132a721139 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2352,6 +2352,22 @@ where exists (select * from sj q
explain (costs off)
select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
-- Degenerated case.
explain (costs off)
select * from
--
2.34.1
Hi, Richard!
On Thu, May 2, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:One question for me is: Do we anticipate other lateral self-references
except the TABLESAMPLE case? Looking into the extract_lateral_references
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
pull up subqueries before extracting lateral references. So, if we have
a reference to a subquery, it means we will not flatten this subquery
and don't execute SJE. Do we need more code, as you have written in the
first patch?I think my first patch was crap anyway. Your explanation seems
reasonable to me. I'm not sure this requires any more code. Probably
it would be enough to add more comments about this.The tablesample case is not the only factor that can cause a relation to
have a lateral dependency on itself after self-join removal. It can
also happen with PHVs. As an example, considerexplain (costs off)
select * from t t1
left join lateral
(select t1.a as t1a, * from t t2) t2
on true
where t1.a = t2.a;
server closed the connection unexpectedlyThis is because after self-join removal, a PlaceHolderInfo's ph_lateral
might contain rels mentioned in ph_eval_at, which we should get rid of.For the tablesample case, I agree that we should not consider relations
with TABLESAMPLE clauses as candidates to be removed. Removing such a
relation could potentially change the syntax of the query, as shown by
Alexander's example. It seems to me that we can just check that in
remove_self_joins_recurse, while we're collecting the base relations
that are considered to be candidates for removal.This leads to the attached patch. This patch also includes some code
refactoring for the surrounding code.
Great, thank you for your work on this!
I'd like to split this into separate patches for better granularity of
git history. I also added 0001 patch, which makes first usage of the
SJE acronym in file to come with disambiguation. Also, I've added
assert that ph_lateral and ph_eval_at didn't overlap before the
changes. I think this should help from the potential situation when
the changes we do could mask another bug.
I would appreciate your review of this patchset, and review from Andrei as well.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0002-Minor-refactoring-for-self-join-elimination-code.patchapplication/octet-stream; name=v2-0002-Minor-refactoring-for-self-join-elimination-code.patchDownload
From 55667635034e5844b73b065928683d01bb874d40 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 2 May 2024 23:20:36 +0300
Subject: [PATCH v2 2/4] Minor refactoring for self-join elimination code
This commit makes some minor miscellaneous refactoring for the self-join
elimination code including:
* Empty lines for better code readability,
* Rearrangement of local variables in remove_self_joins_recurse() for the
sake of brevity and clarity,
* Remove redundant check for not null RangeTblEntry pointer, which has been
dereferenced before.
Discussion: https://postgr.es/m/CAMbWs49Q8g1LPVCeNHYv-Y-gFo826ertrg6nYNC9LL%3D8%3DzHP3g%40mail.gmail.com
Author: Richard Guo
Reviewed-by: Alexander Korotkov
---
src/backend/optimizer/plan/analyzejoins.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b2a633ba4d7..a46c3ffaa5a 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -457,15 +457,20 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
+
phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
+
phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
/* ph_lateral might or might not be empty */
+
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
replace_varno((Node *) phv->phexpr, relid, subst);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
@@ -2292,25 +2297,26 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
if (IsA(jlnode, RangeTblRef))
{
- RangeTblRef *ref = (RangeTblRef *) jlnode;
- RangeTblEntry *rte = root->simple_rte_array[ref->rtindex];
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
/*
* We only care about base relations from which we select
* something.
*/
if (rte->rtekind == RTE_RELATION &&
- rte->relkind == RELKIND_RELATION &&
- root->simple_rel_array[ref->rtindex] != NULL)
+ rte->relkind == RELKIND_RELATION)
{
- Assert(!bms_is_member(ref->rtindex, relids));
- relids = bms_add_member(relids, ref->rtindex);
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
}
}
else if (IsA(jlnode, List))
+ {
/* Recursively go inside the sub-joinlist */
toRemove = remove_self_joins_recurse(root, (List *) jlnode,
toRemove);
+ }
else
elog(ERROR, "unrecognized joinlist node type: %d",
(int) nodeTag(jlnode));
--
2.39.3 (Apple Git-145)
v2-0003-Forbid-self-join-elimination-on-table-sampling-sc.patchapplication/octet-stream; name=v2-0003-Forbid-self-join-elimination-on-table-sampling-sc.patchDownload
From faef921c8abf145362530b8d020a450926361a50 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Fri, 3 May 2024 01:43:31 +0300
Subject: [PATCH v2 3/4] Forbid self-join elimination on table sampling scans
Removing relations with table sampling scans could lead to a change in the
query semantics. There are probably some situations when we can safely do
a self-join elimination by moving table sampling to the remaining relation.
But leave this till the significant interest in this area.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/0ed2cf02-e655-6ec7-a4fe-52fd8d572f65%40gmail.com
Author: Richard Guo, Andrei Lepikhov
Reviewed-by: Alexander Korotkov
---
src/backend/optimizer/plan/analyzejoins.c | 9 ++++++---
src/test/regress/expected/join.out | 19 +++++++++++++++++++
src/test/regress/sql/join.sql | 8 ++++++++
3 files changed, 33 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index a46c3ffaa5a..d82d86b37f4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2301,11 +2301,14 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
RangeTblEntry *rte = root->simple_rte_array[varno];
/*
- * We only care about base relations from which we select
- * something.
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the semantics of the query.
*/
if (rte->rtekind == RTE_RELATION &&
- rte->relkind == RELKIND_RELATION)
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL)
{
Assert(!bms_is_member(varno, relids));
relids = bms_add_member(relids, varno);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4e1288814ab..02765a7bc93 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6215,6 +6215,25 @@ select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
Filter: (b IS NOT NULL)
(6 rows)
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
-- Degenerated case.
explain (costs off)
select * from
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index f5d06dbffb1..0a4ea3df19c 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2352,6 +2352,14 @@ where exists (select * from sj q
explain (costs off)
select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
-- Degenerated case.
explain (costs off)
select * from
--
2.39.3 (Apple Git-145)
v2-0004-Fix-self-join-elimination-work-with-PlaceHolderIn.patchapplication/octet-stream; name=v2-0004-Fix-self-join-elimination-work-with-PlaceHolderIn.patchDownload
From f8f776c0338e7b923bed9fe80e2b3e5bb9326026 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Fri, 3 May 2024 01:51:13 +0300
Subject: [PATCH v2 4/4] Fix self-join elimination work with
PlaceHolderInfo.ph_lateral
After removing the relation participating in the later, relation may end up
depending on itself according to the PlaceHolderInfo.ph_lateral. This commit
makes remove_rel_from_query() revise PlaceHolderInfo.ph_lateral after
replacing relids there.
Discussion: https://postgr.es/m/CAMbWs49Q8g1LPVCeNHYv-Y-gFo826ertrg6nYNC9LL%3D8%3DzHP3g%40mail.gmail.com
Author: Richard Guo
Reviewed-by: Alexander Korotkov
---
src/backend/optimizer/plan/analyzejoins.c | 7 +++++++
src/test/regress/expected/join.out | 13 +++++++++++++
src/test/regress/sql/join.sql | 8 ++++++++
3 files changed, 28 insertions(+)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d82d86b37f4..b52cce05da8 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -462,7 +462,14 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement. We will remove them. But first check they aren't
+ * overlapping already.
+ */
+ Assert(!bms_overlap(phinfo->ph_lateral, phinfo->ph_eval_at));
phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
/* ph_lateral might or might not be empty */
phv->phrels = replace_relid(phv->phrels, relid, subst);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 02765a7bc93..1b0b11c048d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6234,6 +6234,19 @@ select * from sj t1
Filter: (t1.a = a)
(8 rows)
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
-- Degenerated case.
explain (costs off)
select * from
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 0a4ea3df19c..a9844892067 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2360,6 +2360,14 @@ select * from sj t1
(select * from sj tablesample system(t1.b)) s
on t1.a = s.a;
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
-- Degenerated case.
explain (costs off)
select * from
--
2.39.3 (Apple Git-145)
v2-0001-Clarify-the-SJE-self-join-elimination-acronym.patchapplication/octet-stream; name=v2-0001-Clarify-the-SJE-self-join-elimination-acronym.patchDownload
From 2fb894004592a12835ef890de24e9b91f88422ba Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 2 May 2024 23:09:09 +0300
Subject: [PATCH v2 1/4] Clarify the SJE (self-join elimination) acronym
This commit ensures each source file containing the SJE acronym has
disambiguation in its first occurrence.
---
src/backend/optimizer/plan/analyzejoins.c | 2 +-
src/test/regress/expected/join.out | 3 ++-
src/test/regress/sql/join.sql | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c9..b2a633ba4d7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1705,7 +1705,7 @@ update_eclasses(EquivalenceClass *ec, int from, int to)
* "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
* which makes almost every RestrictInfo unique. This type of comparison is
* useful when removing duplicates while moving RestrictInfo's from removed
- * relation to remaining relation during self-join elimination.
+ * relation to remaining relation during self-join elimination (SJE).
*
* XXX: In the future, we might remove the 'rinfo_serial' field completely and
* get rid of this function.
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8b640c2fc2f..4e1288814ab 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6802,7 +6802,8 @@ explain (costs off) select 1 from
reset join_collapse_limit;
reset enable_seqscan;
--- Check that clauses from the join filter list is not lost on the self-join removal
+-- Check that clauses from the join filter list is not lost on the self-join
+-- elimination (SJE)
CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c4c6c7b8ba2..f5d06dbffb1 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2608,7 +2608,8 @@ explain (costs off) select 1 from
reset join_collapse_limit;
reset enable_seqscan;
--- Check that clauses from the join filter list is not lost on the self-join removal
+-- Check that clauses from the join filter list is not lost on the self-join
+-- elimination (SJE)
CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
--
2.39.3 (Apple Git-145)
Alexander Korotkov <aekorotkov@gmail.com> writes:
I would appreciate your review of this patchset, and review from Andrei as well.
I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?
I might feel better about it if there were any reason to think
these were the last major bugs. But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially. There can be no doubt
whatever that the original patch was far from commit-ready.
I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.
regards, tom lane
On 5/3/24 06:19, Tom Lane wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
I would appreciate your review of this patchset, and review from Andrei as well.
I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?I might feel better about it if there were any reason to think
these were the last major bugs. But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially. There can be no doubt
whatever that the original patch was far from commit-ready.I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.
I need to say I don't see any evidence of bad design.
I think this feature follows the example of 2489d76 [1]Make Vars be outer-join-aware, 1349d27 [2]Improve performance of ORDER BY / DISTINCT aggregates,
and partitionwise join features — we get some issues from time to time,
but these strengths and frequencies are significantly reduced.
First and foremost, this feature is highly isolated: like the PWJ
feature, you can just disable (not enable?) SJE and it guarantees you
will avoid the problems.
Secondly, this feature reflects the design decisions the optimiser has
made before. It raises some questions: Do we really control the
consistency of our paths and the plan tree? Maybe we hide our
misunderstanding of its logic by extensively copying expression trees,
sometimes without fundamental necessity. Perhaps the optimiser needs
some abstraction layers or reconstruction to reduce the quickly growing
complexity.
A good example here is [1]Make Vars be outer-join-aware. IMO, the new promising feature it has
introduced isn't worth the complexity it added to the planner.
SJE, much like OR <-> ANY transformation, introduces a fresh perspective
into the planner: if we encounter a complex, redundant query, it may be
more beneficial to invest in simplifying the internal query
representation rather than adding new optimisations that will grapple
with this complexity.
Also, SJE raised questions I've never seen before, like: Could we
control the consistency of the PlannerInfo by changing something in the
logic?
Considering the current state, I don't see any concrete outcomes or
evidence that a redesign of the feature will lead us to a new path.
However, I believe the presence of SJE in the core could potentially
trigger improvements in the planner. As a result, I vote to stay with
the feature. But remember, as an author, I'm not entirely objective, so
let's wait for alternative opinions.
[1]: Make Vars be outer-join-aware
[2]: Improve performance of ORDER BY / DISTINCT aggregates
--
regards,
Andrei Lepikhov
Postgres Professional
Hi, Tom!
On Fri, May 3, 2024 at 2:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
I would appreciate your review of this patchset, and review from Andrei as well.
I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?I might feel better about it if there were any reason to think
these were the last major bugs. But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially. There can be no doubt
whatever that the original patch was far from commit-ready.
I think if we subtract from the SJE followup commits containing
improvements (extra comments, asserts) and fix for in-place Bitmapset
modification, which was there before, the number of fixes will be
closer to ten. And the number of pending fixes will be two. But I
totally get your concern that we're quite late in the release cycle
and new SJE-related issues continue to arise. This leads to a
significant risk of raising many bugs for end users.
I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.
I agree to revert it for v17, but I'm not exactly sure the issue is
design (nevertheless design review is very welcome as any other type
of review). The experience of the bugs arising with the SJE doesn't
show me a particular weak spot in the feature. It looks more like
this patch has to revise awfully a lot planner data structures to
replace one relid with another. And I don't see the design, which
could avoid that. Somewhere in the thread I have proposed a concept
of "alias relids". However, I suspect that could leave us with more
lurking bugs instead of the bug-free code.
I suggest we should give this feature more review and testing, then
commit early v18. That would leave us enough time to fix any other
issues before v18 release.
------
Regards,
Alexander Korotkov
Supabase
On Fri, May 3, 2024 at 4:57 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I agree to revert it for v17, but I'm not exactly sure the issue is
design (nevertheless design review is very welcome as any other type
of review). The experience of the bugs arising with the SJE doesn't
show me a particular weak spot in the feature. It looks more like
this patch has to revise awfully a lot planner data structures to
replace one relid with another. And I don't see the design, which
could avoid that. Somewhere in the thread I have proposed a concept
of "alias relids". However, I suspect that could leave us with more
lurking bugs instead of the bug-free code.
I agree that reverting it for v17 makes sense. In terms of moving
forward, whether a design review is exactly the right idea or not, I'm
not sure. However, I think that the need to replace relids in a lot of
places is something that a design review might potentially flag as a
problem. Maybe there is some other approach that can avoid the need
for this.
On the other hand, maybe there's not. But in that case, the question
becomes how the patch author(s), and committer, are going to make sure
that most of the issues get flushed out before the initial commit.
What we can't do is say - we know that we need to replace relids in a
bunch of places, so we'll change the ones we know about, and then rely
on testing to find any that we missed. There has to be some kind of
systematic plan that everyone can agree should find all of the
affected places, and then if a few slip through, that's fine, that's
how life goes.
I haven't followed the self-join elimination work very closely, and I
do quite like the idea of the feature. However, looking over all the
follow-up commits, it's pretty hard to escape the conclusion that
there were a lot of cases that weren't adequately considered in the
initial work (lateral, result relations, PHVs, etc.). And that is a
big problem -- it really creates a lot of issues for the project when
a major feature commit misses whole areas that it needs to have
considered, as plenty of previous history will show. When anybody
starts to realize that they've not just had a few goofs but have
missed some whole affected area entirely, it's time to start thinking
about a revert.
One of my most embarrassing gaffes in this area personally was
a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
to commit the original patch without realizing it was going to cause
an increase in the WAL size, but I can tell you that when I realized
it, my heart sank through the floor. I'd love to return to that work
if we can all ever agree on a way of addressing that problem, but in
the meantime, that patch is very dead. And ... if somebody had taken
the time to give me a really good design review of that patch, they
might well have noticed, and saved me the embarrassment of committing
something that had no shot of remaining in the tree. Unfortunately,
one of the downsides of being a committer is that you tend to get less
of that sort of review, because people assume you know what you're
doing. Which is fabulous, when you actually do know what you're doing,
and really sucks, when you don't. One of the things I'd like to see
discussed at 2024.pgconf.dev is how we can improve this aspect of how
we work together.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 3/5/2024 20:55, Robert Haas wrote:
One of my most embarrassing gaffes in this area personally was
a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
to commit the original patch without realizing it was going to cause
an increase in the WAL size, but I can tell you that when I realized
it, my heart sank through the floor.
I discovered this feature and agree that it looks like a severe problem.
Unfortunately, in the case of the SJE patch, the committer and reviewers
don't provide negative feedback. We see the only (I'm not sure I use the
proper English phrase) 'negative feelings' from people who haven't
reviewed or analysed it at all (at least, they didn't mention it).
Considering the situation, I suggest setting the default value of
enable_self_join_removal to false in PG17 for added safety and then
changing it to true in early PG18.
Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.
After designing the feature, fixing its bugs, and reviewing joint
patches on the commitfest, the question more likely lies in the planner
design. For example, I wonder if anyone here knows why exactly the
optimiser makes a copy of the whole query subtree in some places.
Another example is PlannerInfo. Can we really control all the
consequences of introducing, let's say, a new JoinDomain entity?
You also mentioned 2024.pgconf.dev. Considering the current migration
policy in some countries, it would be better to work through the online
presence as equivalent to offline. Without an online part of the
conference, the only way to communicate and discuss is through this
mailing list.
--
regards,
Andrei Lepikhov
On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.
To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, or if you do get feedback but
the patch fails to reach an acceptable level anyway, then the only
correct decision is for us to not ship that code. That obviously sucks
from the point of view of the patch author, and also of the committer,
but consider the alternative. Once patches get through an initial
release and become part of the product, the responsibility for fixing
problems is understood to slowly move from the original committer to
the community as a whole. In practice, that means that a lot of the
work of fixing things that are broken, after some initial period, ends
up falling on committers other than the person who did the initial
commit. Even one or two problematic commits can generate an enormous
amount of work for people who weren't involved in the original
development and may not even have agreed with the development
direction, and it is more than fair for those people to express a view
about whether they are willing to carry that burden or not. When they
aren't, I do think that's regrettable, but I don't think it's unusual.
Just in this release, we've removed at least two previously-released
features because they're in bad shape and nobody's willing to maintain
them (snapshot too old, AIX support).
After designing the feature, fixing its bugs, and reviewing joint
patches on the commitfest, the question more likely lies in the planner
design. For example, I wonder if anyone here knows why exactly the
optimiser makes a copy of the whole query subtree in some places.
Another example is PlannerInfo. Can we really control all the
consequences of introducing, let's say, a new JoinDomain entity?
Bluntly, if you can't control those consequences, then you aren't
allowed to make that change.
I know first-hand how difficult some of these problems are. Sometime
in the last year or three, I spent weeks getting rid of ONE global
variable (ThisTimeLineID). It took an absolutely inordinate amount of
time, and it became clear to me that I was never going to get rid of
enough global variables in that part of the code to be able to write a
patch for the feature I wanted without risk of unforeseen
consequences. So I gave up on the entire feature. Maybe I'll try again
at some point, or maybe somebody else will feel like cleaning up that
code and then I can try again with a cleaner base, but what I don't
get to do is write a buggy patch for the feature I want and commit it
anyway. I either figure out a way to do it that I believe is low-risk
and that the community judges to be acceptable, or I don't do it.
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner. That is not to say that I don't approve generally of the idea
of committing things earlier in the release cycle: I certainly do. It
gives us more time to shake out problems with patches before we ship.
But it only makes sense if we collectively believe that the patch is
mostly correct, and only needs fine-tuning, and I think there are good
reasons to believe that we shouldn't have that level of confidence in
this case.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, May 6, 2024 at 10:44:33AM -0400, Robert Haas wrote:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner. That is not to say that I don't approve generally of the idea
of committing things earlier in the release cycle: I certainly do. It
gives us more time to shake out problems with patches before we ship.
But it only makes sense if we collectively believe that the patch is
mostly correct, and only needs fine-tuning, and I think there are good
reasons to believe that we shouldn't have that level of confidence in
this case.
I think what Robert is saying is that it is an unacceptable plan to just
dump the code into PG 18 and clean it up in the following months --- it
needs more research before it is re-added to git.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
Robert Haas <robertmhaas@gmail.com> writes:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner.
FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.
The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)
Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1]/messages/by-id/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.
regards, tom lane
[1]: /messages/by-id/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
On 6/5/2024 21:44, Robert Haas wrote:
On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level,
I'm really sorry that the level of my language caused a misunderstanding.
The main purpose of this work is to form a more or less certain view of
the direction of the planner's development.
Right now, it evolves extensively - new structures, variables,
alternative copies of the same node trees with slightly changed
properties ... This way allows us to quickly introduce some planning
features (a lot of changes in planner logic since PG16 is evidence of
that) and with still growing computing resources it allows postgres to
fit RAM and proper planning time. But maybe we want to be more modest?
The Ashutosh's work he has been doing this year shows how sometimes
expensive the planner is. Perhaps we want machinery that will check the
integrity of planning data except the setrefs, which fail to detect that
occasionally?
If an extensive approach is the only viable option, then it's clear that
this and many other features are simply not suitable for Postgres
Planner. It's disheartening that this patch didn't elicit such
high-level feedback.
--
regards,
Andrei Lepikhov
On Mon, May 6, 2024 at 12:01 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
Right now, it evolves extensively - new structures, variables,
alternative copies of the same node trees with slightly changed
properties ... This way allows us to quickly introduce some planning
features (a lot of changes in planner logic since PG16 is evidence of
that) and with still growing computing resources it allows postgres to
fit RAM and proper planning time. But maybe we want to be more modest?
The Ashutosh's work he has been doing this year shows how sometimes
expensive the planner is. Perhaps we want machinery that will check the
integrity of planning data except the setrefs, which fail to detect that
occasionally?
If an extensive approach is the only viable option, then it's clear that
this and many other features are simply not suitable for Postgres
Planner. It's disheartening that this patch didn't elicit such
high-level feedback.
Well, as I said before, I think self-join elimination is a good
feature, and I believe that it belongs in PostgreSQL. However, I don't
believe that this implementation was done as well as it needed to be
done. A great deal of the work involved in a feature like this lies in
figuring out at what stage of processing certain kinds of
transformations ought to be done, and what cleanup is needed
afterward. It is difficult for anyone to get that completely right the
first time around; left join elimination also provoked a series of
after-the-fact bug fixes. However, I think those were fewer in number
and spread over a longer period of time.
Now that being said, I do also agree that the planner code is quite
hard to understand, for various reasons. I don't think the structure
of that code and the assumptions underlying it are as well-documented
as they could be, and neither do I think that all of them are optimal.
It has taken me a long time to learn as much as I know, and there is
still quite a lot that I don't know. And I also agree that the planner
does an unfortunate amount of in-place modification of existing
structures without a lot of clarity about how it all works, and an
unfortunate amount of data copying in some places, and even that the
partition-wise join code isn't all that it could be. But I do not
think that adds up to a conclusion that we should just be less
ambitious with planner changes. Indeed, I would like to see us do
more. There is certainly a lot of useful work that could be done. The
trick is figuring out how to do it without breaking too many things,
and that is not easy.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, May 6, 2024 at 12:24:41PM -0400, Robert Haas wrote:
Now that being said, I do also agree that the planner code is quite
hard to understand, for various reasons. I don't think the structure
of that code and the assumptions underlying it are as well-documented
as they could be, and neither do I think that all of them are optimal.
It has taken me a long time to learn as much as I know, and there is
still quite a lot that I don't know. And I also agree that the planner
does an unfortunate amount of in-place modification of existing
structures without a lot of clarity about how it all works, and an
unfortunate amount of data copying in some places, and even that the
partition-wise join code isn't all that it could be. But I do not
think that adds up to a conclusion that we should just be less
ambitious with planner changes. Indeed, I would like to see us do
more. There is certainly a lot of useful work that could be done. The
trick is figuring out how to do it without breaking too many things,
and that is not easy.
I agree with Robert. While writting the Postgres 17 release notes, I am
excited to see the many optimizer improvements, and removing self-joins
from that list will be unfortunate.
I did write a blog entry in 2021 that suggested we could have
optimizer aggressiveness control to allow for more expensive
optimizations:
https://momjian.us/main/blogs/pgblog/2021.html#May_14_2021
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
On Mon, May 6, 2024 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner.FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)
Thank you for pointing this. This area certainly requires more investigation.
Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.
Another problem is EPQ. During EPQ, we use most recent tuples for the
target relation and snapshot-satisfying tuples for joined relations.
And that affects RETURNING as well. If we need to return values for
joined relation, that wouldn't be old values, but values of
snapshot-satisfying tuple which might be even older.
Proper support of this looks like quite amount of work for me.
Committing SJE to v18 with this looks challenging. AFICS, going this
way would require substantial help from you.
------
Regards,
Alexander Korotkov
Supabase
On Mon, May 6, 2024 at 5:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner. That is not to say that I don't approve generally of the idea
of committing things earlier in the release cycle: I certainly do. It
gives us more time to shake out problems with patches before we ship.
But it only makes sense if we collectively believe that the patch is
mostly correct, and only needs fine-tuning, and I think there are good
reasons to believe that we shouldn't have that level of confidence in
this case.
I agree it was a hurry to put the plan into commit message. I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided. One way or the other I'm not going to
re-commit this without explicit Tom's consent.
Links.
1. /messages/by-id/3622801.1715010885@sss.pgh.pa.us
------
Regards,
Alexander Korotkov
Supabase
On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I agree it was a hurry to put the plan into commit message. I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided. One way or the other I'm not going to
re-commit this without explicit Tom's consent.
Thanks. I hope we find a way to make it happen.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 5/7/24 02:59, Robert Haas wrote:
On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I agree it was a hurry to put the plan into commit message. I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided. One way or the other I'm not going to
re-commit this without explicit Tom's consent.Thanks. I hope we find a way to make it happen.
Rebased onto current master. Nothing new except rebase-related changes
and some comment fixes.
--
regards,
Andrei Lepikhov
Postgres Professional
Attachments:
0001-Remove-useless-self-joins-v49.patchtext/x-patch; charset=UTF-8; name=0001-Remove-useless-self-joins-v49.patchDownload
From 6646a2d15854e292c41ab13c4ca21962b89639f5 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 7 May 2024 12:04:13 +0700
Subject: [PATCH] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov, Jian He
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1323 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 ++++++++
src/tools/pgindent/typedefs.list | 3 +
15 files changed, 2964 insertions(+), 82 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 698169afdb..7918c4ba74 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5593,6 +5593,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78d..2230b13104 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81f8..b90b9e7ea2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -31,10 +32,28 @@
#include "optimizer/restrictinfo.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +61,18 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static void replace_varno(Node *node, int from, int to);
+static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +110,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +314,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +329,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +341,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +383,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ replace_varno((Node *) otherrel->lateral_vars, relid, subst);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,22 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
/*
@@ -434,10 +448,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +461,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ replace_varno((Node *) phv->phexpr, relid, subst);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +910,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +930,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1248,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1289,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1327,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1341,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1396,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1427,1083 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * replace_varno - find in the given tree any Vars, PlaceHolderVar, and Relids
+ * that reference the removing relid, and change them to the reference to
+ * the replacement relid.
+ *
+ * NOTE: although this has the form of a walker, we cheat and modify the
+ * nodes in-place.
+ */
+
+typedef struct
+{
+ int from;
+ int to;
+ int sublevels_up;
+} ReplaceVarnoContext;
+
+static bool
+replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
+{
+ if (node == NULL)
+ return false;
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (var->varno == ctx->from &&
+ var->varlevelsup == ctx->sublevels_up)
+ {
+ var->varno = ctx->to;
+ var->varnosyn = ctx->to;
+ }
+ return false;
+ }
+ else if (IsA(node, PlaceHolderVar))
+ {
+ PlaceHolderVar *phv = (PlaceHolderVar *) node;
+
+ if (phv->phlevelsup == ctx->sublevels_up)
+ {
+ phv->phrels =
+ replace_relid(phv->phrels, ctx->from, ctx->to);
+ phv->phnullingrels =
+ replace_relid(phv->phnullingrels, ctx->from, ctx->to);
+ }
+
+ /* fall through to recurse into the placeholder's expression */
+ }
+ else if (IsA(node, Query))
+ {
+ /* Recurse into subselects */
+ bool result;
+
+ ctx->sublevels_up++;
+ result = query_tree_walker((Query *) node,
+ replace_varno_walker,
+ (void *) ctx,
+ QTW_EXAMINE_SORTGROUP);
+ ctx->sublevels_up--;
+ return result;
+ }
+ else if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+
+ if (bms_is_member(ctx->from, rinfo->clause_relids))
+ {
+ replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
+ replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, ctx->from, ctx->to);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, ctx->from, ctx->to);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, ctx->from, ctx->to);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ relid == ctx->to && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+
+ return false;
+ }
+
+ return expression_tree_walker(node, replace_varno_walker,
+ (void *) ctx);
+}
+
+static void
+replace_varno(Node *node, int from, int to)
+{
+ ReplaceVarnoContext ctx;
+
+ if (to <= 0)
+ return;
+
+ ctx.from = from;
+ ctx.to = to;
+ ctx.sublevels_up = 0;
+
+ /*
+ * Must be prepared to start with a Query or a bare expression tree.
+ */
+ query_or_expression_tree_walker(node,
+ replace_varno_walker,
+ (void *) &ctx,
+ QTW_EXAMINE_SORTGROUP);
+}
+
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (newId < 0)
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ replace_varno((Node *) em->em_expr, from, to);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ replace_varno((Node *) rinfo, from, to);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid != -1);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ replace_varno((Node *) rinfo, toRemove->relid, toKeep->relid);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ replace_varno(node, toRemove->relid, toKeep->relid);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ replace_varno((Node *) root->parse, toRemove->relid, toKeep->relid);
+
+ /* See remove_self_joins_one_group() */
+ Assert(root->parse->resultRelation != toRemove->relid);
+ Assert(root->parse->resultRelation != toKeep->relid);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ replace_varno((Node *) root->processed_tlist,
+ toRemove->relid, toKeep->relid);
+ replace_varno((Node *) root->processed_groupClause,
+ toRemove->relid, toKeep->relid);
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ replace_varno(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids));
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ replace_varno((Node *) clause, relid, outer->relid);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ /*
+ * We don't accept result relation as either source or target relation
+ * of SJE, because result relation has different behavior in
+ * EvalPlanQual() and RETURNING clause.
+ */
+ if (root->parse->resultRelation == r)
+ continue;
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ if (root->parse->resultRelation == k)
+ continue;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "SJE failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..075d36c7ec 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e97e..c5a8d00bb3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 46c258be28..d17a4c7066 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..b59602a18b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c0224a..fff4b69dfd 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..f2e3fa4c2e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..3d5de28354 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6b16c3a676..1632777918 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6158,6 +6158,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index dbfd0c13d4..2f3eb4e7f1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8bfe3b7ba6..0cc6c6922c 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2321,6 +2321,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4f57078d13..6cf897e7b2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -388,6 +388,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2563,6 +2564,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3989,6 +3991,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.2
Hi!
On Thu, Jun 13, 2024 at 6:45 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
On 5/7/24 02:59, Robert Haas wrote:
On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I agree it was a hurry to put the plan into commit message. I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided. One way or the other I'm not going to
re-commit this without explicit Tom's consent.Thanks. I hope we find a way to make it happen.
Rebased onto current master. Nothing new except rebase-related changes
and some comment fixes.
Thank you. I've registered the patch on commitfest.
Do you plan to address the feedback from Tom Lane?
------
Regards,
Alexander Korotkov
Supabase
On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner.FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)
i have looked around the code.
about replace_varno and ChangeVarNodes:
ChangeVarNodes
have
````
if (IsA(node, RangeTblRef))
{
RangeTblRef *rtr = (RangeTblRef *) node;
if (context->sublevels_up == 0 &&
rtr->rtindex == context->rt_index)
rtr->rtindex = context->new_index;
/* the subquery itself is visited separately */
return false;
}
````
if ChangeVarNodes executed the above code in remove_useless_self_joins and
remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
*nremoved will be zero.
then the below code error branch will be executed.
````
joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
if (nremoved != 1)
elog(ERROR, "failed to find relation %d in joinlist", relid);
```
---------------------------------------------------------------------
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.
then replace_varno will have problems with DELETE, UPDATE, MERGE
someway.
ChangeVarNodes solved this problem.
Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.regards, tom lane
[1] /messages/by-id/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
if only SELECT, no worth to make it being committed,
do you think support DML but no support RETURNING worth the effort?
excerpt from [1] latest patch:
+/* Returning behavior for Vars in RETURNING list */
+typedef enum VarReturningType
+{
+ VAR_RETURNING_DEFAULT, /* return OLD for DELETE, else return NEW */
+ VAR_RETURNING_OLD, /* return OLD for DELETE/UPDATE, else NULL */
+ VAR_RETURNING_NEW, /* return NEW for INSERT/UPDATE, else NULL */
+} VarReturningType;
+
typedef struct Var
{
Expr xpr;
@@ -265,6 +278,9 @@ typedef struct Var
*/
Index varlevelsup;
+ /* returning type of this var (see above) */
+ VarReturningType varreturningtype;
--------------------------------------------
example. e.g.
explain(costs off)
WITH t1 AS (SELECT * FROM emp1) UPDATE emp1 SET code = t1.code + 1
FROM t1 WHERE t1.id = emp1.id RETURNING emp1.code, t1.code;
the returning (emp1.code,t1.code) these two var the VarReturningType
is VAR_RETURNING_DEFAULT.
That means the patch (support-returning-old-new-v9.patch in [1]) see
the RETURNING (emp1.code, t1.code) are two different table.column
references.
but here we need to transform it to
"RETURNING new.code, old.code", i think.
that would be way more harder.
Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.
i will try to explore this area. in this case would be
allow SJE apply to " RETURNING t1.code, t1.code".
I've attached 2 patches, based on the latest patch in this thread.
0001 mainly about replacing all replace_varno to ChangeVarNodes.
0002 makes SJE support for DML without RETURNING clause.
now SJE also works with updatable view. for example:
+CREATE TABLE sj_target (tid integer primary key, balance integer)
WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5,
50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON
t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
Show quoted text
[1] /messages/by-id/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
Attachments:
v2-0002-make-SJE-to-apply-DML.patchtext/x-patch; charset=US-ASCII; name=v2-0002-make-SJE-to-apply-DML.patchDownload
From ce09cd2b960d489f0b2c9ffc7abf1a54c774efc5 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 14 Jun 2024 22:16:54 +0800
Subject: [PATCH v2 2/2] make SJE to apply DML
make SJE to apply DML (MERGE, UPDATE, INSERT, DELETE).
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.
So replace_varno will have problem with DELETE, UPDATE, MERGE.
ChangeVarNodes solved this problem.
* from the INSERT synopsis, we can see that INSERT cannot join with another table,
so SJE cannot be used.
DELETE/UPDATE
excerpt from the synopsis:
UPDATE table_name FROM from_item
DELETE FROM table_name USING from_item
as you can see UPDATE/DELETE, the join order is fixed. so we only need to check
if the table_name is totally replaced by from_item or not.
MERGE command: (merge_insert, merge_update, merge_delete).
for merge_insert: SJE cannot to applicable. it will out in follow code,
for which, i cannot fully explain the reasoning:
```
/*
* It is impossible to eliminate join of two relations if they
* belong to different rules of order. Otherwise planner can't be
* able to find any variants of correct query plan.
*/
foreach(lc, root->join_info_list)
{
SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
if ((bms_is_member(k, info->syn_lefthand) ^
bms_is_member(r, info->syn_lefthand)) ||
(bms_is_member(k, info->syn_righthand) ^
bms_is_member(r, info->syn_righthand)))
{
jinfo_check = false;
break;
}
}
if (!jinfo_check)
continue;
```
merge_delete and merge_update works fine with base table and updateable view.
but SJE cannot work with non-updatable view.
---
src/backend/optimizer/plan/analyzejoins.c | 21 +-
src/test/regress/expected/join.out | 309 +++++++++++++++++-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/join.sql | 97 ++++++
4 files changed, 401 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bb145977..eab39ee0 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1860,10 +1860,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* restore the rangetblref in a proper order. */
restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
- /* See remove_self_joins_one_group() */
- Assert(root->parse->resultRelation != toRemove->relid);
- Assert(root->parse->resultRelation != toKeep->relid);
-
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@@ -2046,14 +2042,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
{
RelOptInfo *inner = root->simple_rel_array[r];
- /*
- * We don't accept result relation as either source or target relation
- * of SJE, because result relation has different behavior in
- * EvalPlanQual() and RETURNING clause.
- */
- if (root->parse->resultRelation == r)
- continue;
-
k = r;
while ((k = bms_next_member(relids, k)) > 0)
@@ -2069,9 +2057,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
PlanRowMark *imark = NULL;
List *uclauses = NIL;
- if (root->parse->resultRelation == k)
- continue;
-
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[k]->relid ==
root->simple_rte_array[r]->relid);
@@ -2391,6 +2376,12 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
(list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
return joinlist;
+ /*
+ * we don't accept RETURNING clause with SJE.
+ */
+ if (root->parse && root->parse->returningList != NIL)
+ return joinlist;
+
/*
* Merge pairs of relations participated in self-join. Remove unnecessary
* range table entries.
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 16327779..6209854a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6385,6 +6385,285 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+------------------- UPDATE SJE (self join elimination) applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: ((a IS NOT NULL) AND ((b = 2) OR (a = 2)))
+(3 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+ QUERY PLAN
+-----------------------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: ((b IS NOT NULL) AND (a = 3))
+(3 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT *, (select count(*) from tenk1) FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.a = sj.a;
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+------------------- UPDATE SJE not applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a or (sq.b = 2 or sq.a = 2);
+ QUERY PLAN
+------------------------------------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: ((sz.a = sq.a) OR (sq.b = 2) OR (sq.a = 2))
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+ QUERY PLAN
+------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = b)
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+ QUERY PLAN
+------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = 3)
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+(7 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 2 and sj.a = 3;
+ QUERY PLAN
+--------------------------------------
+ Update on sj
+ -> Nested Loop
+ Join Filter: (sj.b = sj_1.b)
+ -> Seq Scan on sj
+ Filter: (a = 3)
+ -> Seq Scan on sj sj_1
+ Filter: (a = 2)
+(7 rows)
+
+------------------- DELETE SJE applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a;
+ QUERY PLAN
+---------------------------------
+ Delete on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 2 or sz.b =3);
+ QUERY PLAN
+------------------------------------------------------------
+ Delete on sj sz
+ -> Seq Scan on sj sz
+ Filter: ((a IS NOT NULL) AND ((a = 2) OR (b = 3)))
+(3 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3 and sq.a = 3;
+ QUERY PLAN
+-----------------------------------------------
+ Delete on sj t1
+ -> Seq Scan on sj t1
+ Filter: ((b IS NOT NULL) AND (a = 3))
+(3 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a;
+ QUERY PLAN
+---------------------------------
+ Delete on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+------------------- DELETE SJE not applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = b)
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = 3)
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+(7 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = t1.b)
+ -> Seq Scan on sj t1
+ Filter: (a = 3)
+ -> Seq Scan on sj sq
+(6 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+ QUERY PLAN
+-------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.c = t1.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj t1
+(6 rows)
+
+------------------- MERGE SJE table setup
+CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+create or replace view no_rw_sj_target as select * from sj_target where balance = 60 limit 1;
+--cannot use SJE for RETURNING
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 11 RETURNING *;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+--cannot use SJE for merge insert
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN NOT MATCHED AND s.balance = 10
+ THEN INSERT VALUES (s.tid, s.balance);
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target s
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(5 rows)
+
+--cannot use SJE for merge with non-updatable view
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60
+ THEN update set balance = t.balance + 6;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Subquery Scan on s
+ -> Limit
+ -> Seq Scan on sj_target
+ Filter: (balance = 60)
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(8 rows)
+
+---- the following cases can apply SJE with merge.
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN update set balance = t.balance + 3;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40
+ THEN update set balance = t.balance + 4;
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+(2 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN DELETE;
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+(2 rows)
+
+--- and run the actual query.
+MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60 THEN update set balance = t.balance + 6;
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20 THEN update set balance = t.balance + 2;
+MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30 THEN update set balance = t.balance + 3;
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 THEN update set balance = t.balance + 4;
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50 THEN DELETE;
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
+select * from sj_target order by tid;
+ tid | balance
+-----+---------
+ 1 | 10
+ 2 | 22
+ 3 | 33
+ 4 | 44
+ 6 | 66
+(5 rows)
+
+DROP VIEW no_rw_sj_target;
+DROP VIEW rw_sj_target;
+DROP TABLE sj_target;
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
@@ -7003,29 +7282,23 @@ WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
TRUNCATE emp1;
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
- QUERY PLAN
--------------------------------------
- Update on sj sq
- -> Nested Loop
- Join Filter: (sq.a = sz.a)
- -> Seq Scan on sj sq
- -> Materialize
- -> Seq Scan on sj sz
-(6 rows)
+ QUERY PLAN
+---------------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
CREATE RULE sj_del_rule AS ON DELETE TO sj
DO INSTEAD
UPDATE sj SET a = 1 WHERE a = old.a;
EXPLAIN (COSTS OFF) DELETE FROM sj;
- QUERY PLAN
---------------------------------------
- Update on sj sj_1
- -> Nested Loop
- Join Filter: (sj.a = sj_1.a)
- -> Seq Scan on sj sj_1
- -> Materialize
- -> Seq Scan on sj
-(6 rows)
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
DROP RULE sj_del_rule ON sj CASCADE;
-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568b..dd5b3cd4 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3111,16 +3111,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 0cc6c692..c1d7144c 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2426,6 +2426,103 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+------------------- UPDATE SJE (self join elimination) applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT *, (select count(*) from tenk1) FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.a = sj.a;
+
+
+------------------- UPDATE SJE not applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a or (sq.b = 2 or sq.a = 2);
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 2 and sj.a = 3;
+
+------------------- DELETE SJE applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a;
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 2 or sz.b =3);
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a;
+
+
+------------------- DELETE SJE not applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+
+------------------- MERGE SJE table setup
+CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+create or replace view no_rw_sj_target as select * from sj_target where balance = 60 limit 1;
+
+--cannot use SJE for RETURNING
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 11 RETURNING *;
+--cannot use SJE for merge insert
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN NOT MATCHED AND s.balance = 10
+ THEN INSERT VALUES (s.tid, s.balance);
+
+--cannot use SJE for merge with non-updatable view
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60
+ THEN update set balance = t.balance + 6;
+
+---- the following cases can apply SJE with merge.
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN update set balance = t.balance + 3;
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40
+ THEN update set balance = t.balance + 4;
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN DELETE;
+
+--- and run the actual query.
+MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60 THEN update set balance = t.balance + 6;
+
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20 THEN update set balance = t.balance + 2;
+
+MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30 THEN update set balance = t.balance + 3;
+
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 THEN update set balance = t.balance + 4;
+
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50 THEN DELETE;
+
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
+
+select * from sj_target order by tid;
+DROP VIEW no_rw_sj_target;
+DROP VIEW rw_sj_target;
+DROP TABLE sj_target;
+
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
--
2.34.1
v2-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v2-0001-Remove-useless-self-joins.patchDownload
From 82480cc0649ab986024e39f4ccae462f5173d2ad Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 14 Jun 2024 17:14:16 +0800
Subject: [PATCH v2 1/2] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1228 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 95 ++
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 2 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 ++++++++
src/tools/pgindent/typedefs.list | 3 +
17 files changed, 2966 insertions(+), 82 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 698169af..7918c4ba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5593,6 +5593,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d7..2230b131 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81..bb145977 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -29,12 +30,31 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +340,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +382,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -434,10 +449,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +462,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +911,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +931,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1249,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1290,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1328,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1342,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1397,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1428,987 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+typedef struct
+{
+ int rt_index;
+ int new_index;
+ int cnt;
+ int sublevels_up;
+} restore_rangetblref_context;
+
+static bool
+restore_rangetblref_walker(Node *node, restore_rangetblref_context *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, RangeTblRef))
+ {
+ RangeTblRef *rtr = (RangeTblRef *) node;
+
+ if (context->sublevels_up == 0 && context->cnt == 0 &&
+ rtr->rtindex == context->rt_index)
+ {
+ rtr->rtindex = context->new_index;
+ context->cnt++;
+ }
+ //TODO, here, should we return false immediately.
+ // return false;
+ }
+
+ if (IsA(node, Query))
+ {
+ /* Recurse into subselects */
+ bool result;
+
+ context->sublevels_up++;
+ result = query_tree_walker((Query *) node, restore_rangetblref_walker,
+ (void *) context, 0);
+ context->sublevels_up--;
+ return result;
+ }
+ return expression_tree_walker(node, restore_rangetblref_walker,
+ (void *) context);
+}
+
+/*
+ * in remove_self_join_rel, ChangeVarNodes may change some of
+ * the RangeTblRef->rtindex in the root Query, here we do the change it back.
+ *
+ * e.g. ChangeVarNodes maybe change
+ * root->parse subnode JoinExpr's larg, rarg (RangeTblRef->rtindex)
+ * from (1,2) to (2,2), we need to change it back to (1,2).
+ * For node JoinExpr, the query tree WALK logic is through larg,then rarg,
+ * we will change the first "2" to "1", not the second.
+ *
+*/
+static void
+restore_rangetblref(Node *node, int rt_index, int new_index, int cnt, int sublevels_up)
+{
+ restore_rangetblref_context context;
+ context.rt_index = rt_index;
+ context.new_index = new_index;
+ context.cnt = cnt;
+ context.sublevels_up = sublevels_up;
+
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+ query_tree_walker(qry, restore_rangetblref_walker,
+ (void *) &context, 0);
+ }
+ else
+ restore_rangetblref_walker(node, &context);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);
+ /* restore the rangetblref in a proper order. */
+ restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
+
+ /* See remove_self_joins_one_group() */
+ Assert(root->parse->resultRelation != toRemove->relid);
+ Assert(root->parse->resultRelation != toKeep->relid);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ /*
+ * We don't accept result relation as either source or target relation
+ * of SJE, because result relation has different behavior in
+ * EvalPlanQual() and RETURNING clause.
+ */
+ if (root->parse->resultRelation == r)
+ continue;
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ if (root->parse->resultRelation == k)
+ continue;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5..075d36c7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e9..c5a8d00b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0..85b0d91f 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -540,6 +540,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (newId < 0)
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -637,6 +664,74 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, later processing it will be
+ * optimized out, but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 46c258be..d17a4c70 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c1..b59602a1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c022..fff4b69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc1737..f2e3fa4c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049..11593d36 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,6 +42,7 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047..3d5de283 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6b16c3a6..16327779 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6158,6 +6158,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index dbfd0c13..2f3eb4e7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31..77dd964e 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8bfe3b7b..0cc6c692 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2321,6 +2321,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 61ad417c..4039ba7e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -388,6 +388,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2563,6 +2564,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3990,6 +3992,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
base-commit: f1affb67055c9b3f31a7ee7eb521a9ba64fff488
--
2.34.1
On Mon, Jun 17, 2024 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.regards, tom lane
[1] /messages/by-id/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
attached patchset:
v3-0001-Remove-useless-self-joins.patch
v3-0002-make-SJE-to-apply-DML-MERGE-UPDATE-INSERT-DELETE.patch
v3-0003-use-SJE-conditionally-with-delete-update-merge-RE.patch
v3-0001: based on [1]/messages/by-id/55f680bc-756d-4dd3-ab27-3c6e663b0e4c@postgrespro.ru patch, replace all replace_varno to ChangeVarNodes.
v3-0002: make SJE apply to UPDATE/DELETE/MERGE
v3-0003: make SJE apply to UPDATE/DELETE/MERGE RETURNING conditionally
v3-0001, v3-0002 content, reasoning is the same as the previous thread[2]/messages/by-id/CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE+LGJEQaL9w@mail.gmail.com.
v3-0003 is new.
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.
[1]: /messages/by-id/55f680bc-756d-4dd3-ab27-3c6e663b0e4c@postgrespro.ru
[2]: /messages/by-id/CACJufxG3sqJKe1OskHhn7OCdtrEeeRFcD8R4TTQE+LGJEQaL9w@mail.gmail.com
Attachments:
v3-0002-make-SJE-to-apply-DML-MERGE-UPDATE-INSERT-DELETE.patchtext/x-patch; charset=US-ASCII; name=v3-0002-make-SJE-to-apply-DML-MERGE-UPDATE-INSERT-DELETE.patchDownload
From bc5511acfa7acefad676ba6d15c4b181f2edcc5f Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Wed, 26 Jun 2024 14:02:01 +0800
Subject: [PATCH v3 2/3] make SJE to apply DML (MERGE, UPDATE, INSERT, DELETE)
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.
So replace_varno will have problem with DELETE, UPDATE, MERGE.
ChangeVarNodes solved this problem.
* from the INSERT synopsis, we can see that INSERT cannot join with another table,
so SJE cannot be used.
DELETE/UPDATE
excerpt from the synopsis:
UPDATE table_name FROM from_item
DELETE FROM table_name USING from_item
as you can see UPDATE/DELETE, the join order is fixed. so we only need to check
if the table_name is totally replaced by from_item or not.
MERGE command: (merge_insert, merge_update, merge_delete).
for merge_insert: SJE cannot to applicable. it will out in follow code,
for which, i cannot fully explain the reasoning:
```
/*
* It is impossible to eliminate join of two relations if they
* belong to different rules of order. Otherwise planner can't be
* able to find any variants of correct query plan.
*/
foreach(lc, root->join_info_list)
{
SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
if ((bms_is_member(k, info->syn_lefthand) ^
bms_is_member(r, info->syn_lefthand)) ||
(bms_is_member(k, info->syn_righthand) ^
bms_is_member(r, info->syn_righthand)))
{
jinfo_check = false;
break;
}
}
if (!jinfo_check)
continue;
```
merge_delete and merge_update works fine with base table and updateable view.
but SJE cannot work with non-updatable view.
---
src/backend/optimizer/plan/analyzejoins.c | 21 +-
src/test/regress/expected/join.out | 353 +++++++++++++++++-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/join.sql | 122 ++++++
4 files changed, 470 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bb145977..eab39ee0 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1860,10 +1860,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* restore the rangetblref in a proper order. */
restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
- /* See remove_self_joins_one_group() */
- Assert(root->parse->resultRelation != toRemove->relid);
- Assert(root->parse->resultRelation != toKeep->relid);
-
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@@ -2046,14 +2042,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
{
RelOptInfo *inner = root->simple_rel_array[r];
- /*
- * We don't accept result relation as either source or target relation
- * of SJE, because result relation has different behavior in
- * EvalPlanQual() and RETURNING clause.
- */
- if (root->parse->resultRelation == r)
- continue;
-
k = r;
while ((k = bms_next_member(relids, k)) > 0)
@@ -2069,9 +2057,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
PlanRowMark *imark = NULL;
List *uclauses = NIL;
- if (root->parse->resultRelation == k)
- continue;
-
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[k]->relid ==
root->simple_rte_array[r]->relid);
@@ -2391,6 +2376,12 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
(list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
return joinlist;
+ /*
+ * we don't accept RETURNING clause with SJE.
+ */
+ if (root->parse && root->parse->returningList != NIL)
+ return joinlist;
+
/*
* Merge pairs of relations participated in self-join. Remove unnecessary
* range table entries.
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 16327779..9f89b351 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6385,6 +6385,329 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+--truncate sj, refill data.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+------------------- UPDATE SJE (self join elimination) applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: ((a IS NOT NULL) AND ((b = 2) OR (a = 2)))
+(3 rows)
+
+UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+ QUERY PLAN
+-----------------------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: ((b IS NOT NULL) AND (a = 3))
+(3 rows)
+
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+ QUERY PLAN
+-------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a = 4)
+(3 rows)
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+select * from sj order by a;
+ a | b | c
+----+---+----
+ 1 | 1 | 11
+ 2 | 4 | 12
+ 5 | 5 | 15
+ 6 | 6 | 16
+ 7 | 7 | 17
+ 8 | 8 | 18
+ 9 | 9 | 19
+ 10 | 0 | 20
+ 12 | 4 | 14
+ 19 | 3 | 13
+(10 rows)
+
+------------------- UPDATE SJE not applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a or (sq.b = 2 or sq.a = 2);
+ QUERY PLAN
+------------------------------------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: ((sz.a = sq.a) OR (sq.b = 2) OR (sq.a = 2))
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+ QUERY PLAN
+------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = b)
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+ QUERY PLAN
+------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = 3)
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+(7 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 2 and sj.a = 3;
+ QUERY PLAN
+--------------------------------------
+ Update on sj
+ -> Nested Loop
+ Join Filter: (sj.b = sj_1.b)
+ -> Seq Scan on sj
+ Filter: (a = 3)
+ -> Seq Scan on sj sj_1
+ Filter: (a = 2)
+(7 rows)
+
+------------------- DELETE SJE applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+ QUERY PLAN
+-------------------------
+ Delete on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a = 5)
+(3 rows)
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+ QUERY PLAN
+--------------------------------------------------------------
+ Delete on sj sz
+ -> Seq Scan on sj sz
+ Filter: ((a IS NOT NULL) AND ((a = 6) OR (b = 113)))
+(3 rows)
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+ QUERY PLAN
+-----------------------------------------------
+ Delete on sj t1
+ -> Seq Scan on sj t1
+ Filter: ((b IS NOT NULL) AND (a = 7))
+(3 rows)
+
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+ QUERY PLAN
+-------------------------
+ Delete on sj
+ -> Seq Scan on sj
+ Filter: (a = 8)
+(3 rows)
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+select * from sj order by a;
+ a | b | c
+----+---+----
+ 1 | 1 | 11
+ 2 | 4 | 12
+ 9 | 9 | 19
+ 10 | 0 | 20
+ 12 | 4 | 14
+ 19 | 3 | 13
+(6 rows)
+
+------------------- DELETE SJE not applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = b)
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = 3)
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+(7 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = t1.b)
+ -> Seq Scan on sj t1
+ Filter: (a = 3)
+ -> Seq Scan on sj sq
+(6 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+ QUERY PLAN
+-------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.c = t1.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj t1
+(6 rows)
+
+------------------- MERGE SJE table setup
+CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+create or replace view no_rw_sj_target as select * from sj_target where balance = 60 limit 1;
+--cannot use SJE for RETURNING
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 11 RETURNING *;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+--cannot use SJE for merge insert
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN NOT MATCHED AND s.balance = 10
+ THEN INSERT VALUES (s.tid, s.balance);
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target s
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(5 rows)
+
+--cannot use SJE for merge with non-updatable view
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60
+ THEN update set balance = t.balance + 6;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Subquery Scan on s
+ -> Limit
+ -> Seq Scan on sj_target
+ Filter: (balance = 60)
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(8 rows)
+
+---- the following cases can apply SJE with merge.
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN update set balance = t.balance + 3;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40
+ THEN update set balance = t.balance + 4;
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+(2 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN DELETE;
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+(2 rows)
+
+--- and run the actual query.
+MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60 THEN update set balance = t.balance + 6;
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20 THEN update set balance = t.balance + 2;
+MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30 THEN update set balance = t.balance + 3;
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 THEN update set balance = t.balance + 4;
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50 THEN DELETE;
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
+select * from sj_target order by tid;
+ tid | balance
+-----+---------
+ 1 | 10
+ 2 | 22
+ 3 | 33
+ 4 | 44
+ 6 | 66
+(5 rows)
+
+DROP VIEW no_rw_sj_target;
+DROP VIEW rw_sj_target;
+DROP TABLE sj_target;
+truncate sj;
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
@@ -7003,29 +7326,23 @@ WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
TRUNCATE emp1;
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
- QUERY PLAN
--------------------------------------
- Update on sj sq
- -> Nested Loop
- Join Filter: (sq.a = sz.a)
- -> Seq Scan on sj sq
- -> Materialize
- -> Seq Scan on sj sz
-(6 rows)
+ QUERY PLAN
+---------------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
CREATE RULE sj_del_rule AS ON DELETE TO sj
DO INSTEAD
UPDATE sj SET a = 1 WHERE a = old.a;
EXPLAIN (COSTS OFF) DELETE FROM sj;
- QUERY PLAN
---------------------------------------
- Update on sj sj_1
- -> Nested Loop
- Join Filter: (sj.a = sj_1.a)
- -> Seq Scan on sj sj_1
- -> Materialize
- -> Seq Scan on sj
-(6 rows)
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
DROP RULE sj_del_rule ON sj CASCADE;
-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 1d1f568b..dd5b3cd4 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3111,16 +3111,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 0cc6c692..4afbaf79 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2426,6 +2426,128 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+--truncate sj, refill data.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+------------------- UPDATE SJE (self join elimination) applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+
+UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+
+select * from sj order by a;
+------------------- UPDATE SJE not applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a or (sq.b = 2 or sq.a = 2);
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 2 and sj.a = 3;
+
+------------------- DELETE SJE applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+
+select * from sj order by a;
+------------------- DELETE SJE not applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+
+------------------- MERGE SJE table setup
+CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+create or replace view no_rw_sj_target as select * from sj_target where balance = 60 limit 1;
+
+--cannot use SJE for RETURNING
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 11 RETURNING *;
+--cannot use SJE for merge insert
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN NOT MATCHED AND s.balance = 10
+ THEN INSERT VALUES (s.tid, s.balance);
+
+--cannot use SJE for merge with non-updatable view
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60
+ THEN update set balance = t.balance + 6;
+
+---- the following cases can apply SJE with merge.
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN update set balance = t.balance + 3;
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40
+ THEN update set balance = t.balance + 4;
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN DELETE;
+
+--- and run the actual query.
+MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60 THEN update set balance = t.balance + 6;
+
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20 THEN update set balance = t.balance + 2;
+
+MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30 THEN update set balance = t.balance + 3;
+
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 THEN update set balance = t.balance + 4;
+
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50 THEN DELETE;
+
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
+
+select * from sj_target order by tid;
+DROP VIEW no_rw_sj_target;
+DROP VIEW rw_sj_target;
+DROP TABLE sj_target;
+
+truncate sj;
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
--
2.34.1
v3-0003-use-SJE-conditionally-with-delete-update-merge-RE.patchtext/x-patch; charset=US-ASCII; name=v3-0003-use-SJE-conditionally-with-delete-update-merge-RE.patchDownload
From 555b5d66893f573875cebb03bfbe98902fd3ed4c Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 2 Jul 2024 01:45:54 +0800
Subject: [PATCH v3 3/3] use SJE conditionally with delete/update/merge
RETURNING
ExecProcessReturning can only return the newly changed slot,
cannot have the old/previously unchanged slot.
update/delete/merge returning can only return newly/changed value.
for this reason, the SJE can only apply to RETURNING, where the
var's varno is associated the original Query->resultRelation.
for example:
```
drop table if exists sj;
create table sj (a int unique, b int, c int unique);
insert into sj values (3, 1, 21);
--this can apply SJE.
explain(costs off, verbose)
UPDATE sj sq SET a = 4 FROM sj as sz WHERE sq.a = sz.a and sq.a = 3 returning sq.a;
--this can not apply SJE.
explain(costs off, verbose)
UPDATE sj sq SET a = 4 FROM sj as sz WHERE sq.a = sz.a and sq.a = 3 returning sz.a;
```
so based on this, check if root->parse->returningList associated var's varno is not
equal to root->parse->resultRelation, then not apply SJE.
MERGE:
if you specified "merge when matched" and "merge when not matched",
it will use left join, so we don't need worry about
"WHEN NOT MATCHED BY SOURCE" and "WHEN NOT MATCHED BY TARGET".
so overall the merge, we only need take care of merge_update, merge_delete.
---
src/backend/optimizer/plan/analyzejoins.c | 74 ++++-
src/test/regress/expected/join.out | 334 ++++++++++++++++++++++
src/test/regress/sql/join.sql | 148 ++++++++++
3 files changed, 554 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index eab39ee0..9aece739 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2331,6 +2331,69 @@ self_join_candidates_cmp(const void *a, const void *b)
return 0;
}
+typedef struct
+{
+ int rt_varno;
+ bool rt_varno_exists; /* require all var nodes varno == rt_varno */
+ int sublevels_up;
+} returning_var_context;
+
+static bool
+returning_list_var_walker(Node *node, returning_var_context *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, Var))
+ {
+ Var *var = (Var *) node;
+
+ if (context->sublevels_up == 0 &&
+ var->varno != context->rt_varno)
+ {
+ context->rt_varno_exists = false;
+ return false;
+ }
+ }
+
+ if (IsA(node, Query))
+ {
+ bool result;
+
+ context->sublevels_up++;
+ result = query_tree_walker((Query *) node, returning_list_var_walker,
+ (void *) context, 0);
+ context->sublevels_up--;
+ return result;
+ }
+ return expression_tree_walker(node, returning_list_var_walker,
+ (void *) context);
+}
+
+/*
+ * check if only rt_varno exists in returninglist var
+ *
+*/
+static bool
+varno_in_returninglist(Node *node, int rt_varno, int sublevels_up)
+{
+ returning_var_context context;
+ context.rt_varno = rt_varno;
+ context.rt_varno_exists = true;
+ context.sublevels_up = 0;
+
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+ query_tree_walker(qry, returning_list_var_walker,
+ (void *) &context, 0);
+ }
+ else
+ returning_list_var_walker(node, &context);
+
+ return context.rt_varno_exists;
+}
+
/*
* Find and remove useless self joins.
*
@@ -2377,9 +2440,16 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
return joinlist;
/*
- * we don't accept RETURNING clause with SJE.
+ * to make SJE be appliable, for the UPDATE/DELETE/MERGE RETURNING related var(s)
+ * we can only contain the original resultRelation associated rel.
+ * if others rel varno is in RETURNING var list, then applying SJE,
+ * we will seeing old (before changed) value,
+ * but RETURNING we can only see the newly changed value now.
+ *
*/
- if (root->parse && root->parse->returningList != NIL)
+ if (root->parse &&
+ root->parse->returningList != NIL &&
+ !varno_in_returninglist((Node *) root->parse->returningList, root->parse->resultRelation, 0))
return joinlist;
/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9f89b351..c523ee0d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6577,6 +6577,157 @@ EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as
-> Seq Scan on sj t1
(6 rows)
+---UPDATE/DELETE RETURNING SJE applicable cases.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+----test query that can use SJE
+explain(costs off)
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+ QUERY PLAN
+-------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+ SubPlan 1
+ -> Result
+(5 rows)
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+ QUERY PLAN
+-------------------------
+ Update on sj s2
+ -> Seq Scan on sj s2
+ Filter: (a = 3)
+(3 rows)
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+ QUERY PLAN
+------------------------------------
+ Update on sj s1
+ -> Nested Loop
+ Join Filter: (s1.b = s2.b)
+ -> Seq Scan on sj s1
+ Filter: (a = 4)
+ -> Seq Scan on sj s2
+ SubPlan 1
+ -> Result
+(8 rows)
+
+explain(costs off)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+ QUERY PLAN
+-------------------------
+ Delete on sj
+ -> Seq Scan on sj
+ Filter: (a = 5)
+ SubPlan 1
+ -> Result
+(5 rows)
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ QUERY PLAN
+------------------------------------
+ Delete on sj s1
+ -> Nested Loop
+ Join Filter: (s1.b = s2.b)
+ -> Seq Scan on sj s1
+ Filter: (a = 6)
+ -> Seq Scan on sj s2
+(6 rows)
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = s2.b)
+ -> Seq Scan on sj s2
+ Filter: (a = 7)
+ -> Seq Scan on sj sq
+(6 rows)
+
+----actually execute the query to validate the result
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+ a | b | c
+----+----+----
+ 17 | 17 | 32
+(1 row)
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+ a | b | c
+----+----+----
+ 18 | 18 | 38
+(1 row)
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+ a | b | c | a
+----+----+----+----
+ 19 | 19 | 44 | 19
+(1 row)
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+ a | ?column? | ?column?
+---+----------+----------
+ 5 | 25 | 10
+(1 row)
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ a | b | c | tbl
+---+---+----+-----
+ 6 | 6 | 16 | sj
+(1 row)
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ a | b | c | tbl
+---+---+----+-----
+ 7 | 7 | 17 | sj
+(1 row)
+
+select * from sj order by a;
+ a | b | c
+----+----+----
+ 1 | 1 | 11
+ 8 | 8 | 18
+ 9 | 9 | 19
+ 10 | 0 | 20
+ 17 | 17 | 32
+ 18 | 18 | 38
+ 19 | 19 | 44
+(7 rows)
+
------------------- MERGE SJE table setup
CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
@@ -6702,6 +6853,189 @@ select * from sj_target order by tid;
6 | 66
(5 rows)
+-------------------MERGRE RETURNING SJE TEST--------------------------------
+TRUNCATE sj_target;
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+--"when matched" and "when not matched" both specified
+--because of "when not matched" then can only use left join
+--therefore cannot apply SJE
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance, tid = t.tid + s.tid
+ WHEN NOT MATCHED BY SOURCE AND t.balance = 20 THEN DELETE
+ RETURNING t.*, (select t.balance), merge_action();
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+ SubPlan 1
+ -> Result
+(7 rows)
+
+---"when not matched" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+WHEN NOT MATCHED BY SOURCE AND t.balance = 10 THEN
+ DELETE
+returning t.*;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+---"when not matched by target" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target s ON t.tid = s.tid
+ WHEN NOT MATCHED THEN
+ INSERT VALUES (s.tid, 11)
+ RETURNING t.*, merge_action();
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target s
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(5 rows)
+
+---returning with multiple rel, merge returning cannot be applied.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN UPDATE SET balance = t.balance + s.balance + 50
+ RETURNING s.*, t.balance, merge_action();
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+--------the following (explain) query can use SJE.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+ SubPlan 1
+ -> Result
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ QUERY PLAN
+-------------------------------------------------------------
+ Merge on sj_target s0
+ -> Nested Loop
+ Join Filter: (s0.balance = s1.balance)
+ -> Seq Scan on sj_target s1
+ -> Materialize
+ -> Bitmap Heap Scan on sj_target s0
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(9 rows)
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ QUERY PLAN
+-------------------------------------------------------------
+ Merge on sj_target
+ -> Nested Loop
+ Join Filter: (s0.balance = sj_target.balance)
+ -> Seq Scan on sj_target s0
+ -> Materialize
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(9 rows)
+
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+ QUERY PLAN
+------------------------------------------------
+ Merge on sj_target s0
+ -> Nested Loop
+ Join Filter: (s0.balance = s1.balance)
+ -> Seq Scan on sj_target s0
+ -> Materialize
+ -> Seq Scan on sj_target s1
+ SubPlan 1
+ -> Result
+(8 rows)
+
+--------and actually running query validate the result.
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+ tid | balance | balance | merge_action
+-----+---------+---------+--------------
+ 12 | 120 | 120 | UPDATE
+(1 row)
+
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ tid | balance | merge_action
+-----+---------+--------------
+ 14 | 140 | UPDATE
+(1 row)
+
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ tid | balance | merge_action
+-----+---------+--------------
+ 16 | 160 | UPDATE
+(1 row)
+
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+ tid | balance | tableoid | merge_action
+-----+---------+-----------+--------------
+ 18 | 180 | sj_target | UPDATE
+(1 row)
+
+select * from sj_target;
+ tid | balance
+-----+---------
+ 5 | 50
+ 6 | 60
+ 12 | 120
+ 14 | 140
+ 16 | 160
+ 18 | 180
+(6 rows)
+
DROP VIEW no_rw_sj_target;
DROP VIEW rw_sj_target;
DROP TABLE sj_target;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4afbaf79..0d5cb057 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2484,6 +2484,72 @@ EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as
EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+---UPDATE/DELETE RETURNING SJE applicable cases.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+
+----test query that can use SJE
+explain(costs off)
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+
+explain(costs off)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+----actually execute the query to validate the result
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+select * from sj order by a;
------------------- MERGE SJE table setup
CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
@@ -2541,6 +2607,88 @@ MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
select * from sj_target order by tid;
+-------------------MERGRE RETURNING SJE TEST--------------------------------
+TRUNCATE sj_target;
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+
+--"when matched" and "when not matched" both specified
+--because of "when not matched" then can only use left join
+--therefore cannot apply SJE
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance, tid = t.tid + s.tid
+ WHEN NOT MATCHED BY SOURCE AND t.balance = 20 THEN DELETE
+ RETURNING t.*, (select t.balance), merge_action();
+
+---"when not matched" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+WHEN NOT MATCHED BY SOURCE AND t.balance = 10 THEN
+ DELETE
+returning t.*;
+
+---"when not matched by target" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target s ON t.tid = s.tid
+ WHEN NOT MATCHED THEN
+ INSERT VALUES (s.tid, 11)
+ RETURNING t.*, merge_action();
+
+---returning with multiple rel, merge returning cannot be applied.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN UPDATE SET balance = t.balance + s.balance + 50
+ RETURNING s.*, t.balance, merge_action();
+
+--------the following (explain) query can use SJE.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+--------and actually running query validate the result.
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+
+select * from sj_target;
+
DROP VIEW no_rw_sj_target;
DROP VIEW rw_sj_target;
DROP TABLE sj_target;
--
2.34.1
v3-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v3-0001-Remove-useless-self-joins.patchDownload
From d2498b52ea0821e4b432939f5eb5645df6930c03 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Fri, 14 Jun 2024 17:14:16 +0800
Subject: [PATCH v3 1/3] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1228 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 95 ++
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 2 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 ++++++++
src/tools/pgindent/typedefs.list | 3 +
17 files changed, 2966 insertions(+), 82 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 698169af..7918c4ba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5593,6 +5593,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d7..2230b131 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81..bb145977 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -29,12 +30,31 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +340,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +382,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -434,10 +449,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +462,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +911,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +931,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1249,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1290,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1328,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1342,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1397,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1428,987 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+typedef struct
+{
+ int rt_index;
+ int new_index;
+ int cnt;
+ int sublevels_up;
+} restore_rangetblref_context;
+
+static bool
+restore_rangetblref_walker(Node *node, restore_rangetblref_context *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, RangeTblRef))
+ {
+ RangeTblRef *rtr = (RangeTblRef *) node;
+
+ if (context->sublevels_up == 0 && context->cnt == 0 &&
+ rtr->rtindex == context->rt_index)
+ {
+ rtr->rtindex = context->new_index;
+ context->cnt++;
+ }
+ //TODO, here, should we return false immediately.
+ // return false;
+ }
+
+ if (IsA(node, Query))
+ {
+ /* Recurse into subselects */
+ bool result;
+
+ context->sublevels_up++;
+ result = query_tree_walker((Query *) node, restore_rangetblref_walker,
+ (void *) context, 0);
+ context->sublevels_up--;
+ return result;
+ }
+ return expression_tree_walker(node, restore_rangetblref_walker,
+ (void *) context);
+}
+
+/*
+ * in remove_self_join_rel, ChangeVarNodes may change some of
+ * the RangeTblRef->rtindex in the root Query, here we do the change it back.
+ *
+ * e.g. ChangeVarNodes maybe change
+ * root->parse subnode JoinExpr's larg, rarg (RangeTblRef->rtindex)
+ * from (1,2) to (2,2), we need to change it back to (1,2).
+ * For node JoinExpr, the query tree WALK logic is through larg,then rarg,
+ * we will change the first "2" to "1", not the second.
+ *
+*/
+static void
+restore_rangetblref(Node *node, int rt_index, int new_index, int cnt, int sublevels_up)
+{
+ restore_rangetblref_context context;
+ context.rt_index = rt_index;
+ context.new_index = new_index;
+ context.cnt = cnt;
+ context.sublevels_up = sublevels_up;
+
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+ query_tree_walker(qry, restore_rangetblref_walker,
+ (void *) &context, 0);
+ }
+ else
+ restore_rangetblref_walker(node, &context);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);
+ /* restore the rangetblref in a proper order. */
+ restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
+
+ /* See remove_self_joins_one_group() */
+ Assert(root->parse->resultRelation != toRemove->relid);
+ Assert(root->parse->resultRelation != toKeep->relid);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ /*
+ * We don't accept result relation as either source or target relation
+ * of SJE, because result relation has different behavior in
+ * EvalPlanQual() and RETURNING clause.
+ */
+ if (root->parse->resultRelation == r)
+ continue;
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ if (root->parse->resultRelation == k)
+ continue;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5..075d36c7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e9..c5a8d00b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0..85b0d91f 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -540,6 +540,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (newId < 0)
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -637,6 +664,74 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, later processing it will be
+ * optimized out, but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 46c258be..d17a4c70 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c1..b59602a1 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c022..fff4b69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc1737..f2e3fa4c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049..11593d36 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,6 +42,7 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047..3d5de283 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 6b16c3a6..16327779 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6158,6 +6158,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index dbfd0c13..2f3eb4e7 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31..77dd964e 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8bfe3b7b..0cc6c692 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2321,6 +2321,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 61ad417c..4039ba7e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -388,6 +388,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2563,6 +2564,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3990,6 +3992,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.34.1
On Mon, Jun 17, 2024 at 3:00 AM jian he <jian.universality@gmail.com> wrote:
On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner.FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)i have looked around the code.
about replace_varno and ChangeVarNodes:ChangeVarNodes
have
````
if (IsA(node, RangeTblRef))
{
RangeTblRef *rtr = (RangeTblRef *) node;if (context->sublevels_up == 0 &&
rtr->rtindex == context->rt_index)
rtr->rtindex = context->new_index;
/* the subquery itself is visited separately */
return false;
}
````
if ChangeVarNodes executed the above code in remove_useless_self_joins and
remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
*nremoved will be zero.
then the below code error branch will be executed.
````
joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
if (nremoved != 1)
elog(ERROR, "failed to find relation %d in joinlist", relid);
```
Did you manage to overcome this problem in your patch? If not, why do
regression tests pass while this seems to affect pretty much every
self-join removal? If so, how did you do that?
---------------------------------------------------------------------
replace_varno and replace_varno_walker didn't replace
Query->resultRelation, Query->mergeTargetRelation
as ChangeVarNodes did.then replace_varno will have problems with DELETE, UPDATE, MERGE
someway.
ChangeVarNodes solved this problem.Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.regards, tom lane
[1] /messages/by-id/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
if only SELECT, no worth to make it being committed,
do you think support DML but no support RETURNING worth the effort?
It appears you didn't try to address the EPQ problem, which seems to
me even more serious than the RETURNING problem.
See the following example.
Session 1
# create table test (id int primary key, val int);
# insert into test values (1,1);
# begin;
# update test set val = val + 1 where id = 1;
Session 2
# update test set val = t.val + 1 from test t where test.id = t.id;
(wait)
Session 1
# commit;
With v3 patch the query of session 2 fails on assert even before
starting to wait for the tuple lock. But even if we fix that, I
expect that after SJE this example would result in val = 3. Without
SJE, it would result with val = 2, because during EPQ alias t still
references the row version read according to the snapshot. In order
to overcome that we need to distinguish Var, which points to the
latest version during EPQ, and Var, which points to the snapshot
version during EPQ. Probably I'm wrong, but this change seems to have
quite terrific complexity.
The way to workaround that could be to apply SJE for target relations
only on REPEATABLE READ or SERIALIZABLE. But that would require
introducing dependency of query planning on the isolation level. I'm
not quite happy with this already. Also, this would lead to very
non-obvious user-visible gotcha that data-modification queries have
better performance on higher isolation levels.
BTW, I don't think SJE for just SELECTs doesn't make sense. For me,
it seems reasonable to take a step-by-step approach to first nail down
SJE just for SELECTs (that looks enough challenge for a single
release). And then attack the problems we have with data-modification
queries.
Links.
1. /messages/by-id/2285d5d0-f330-e8b6-9ee5-b2b90e44409b@postgrespro.ru
------
Regards,
Alexander Korotkov
Supabase
On Wed, Jul 3, 2024 at 11:39 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jun 17, 2024 at 3:00 AM jian he <jian.universality@gmail.com> wrote:
On Mon, May 6, 2024 at 11:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner.FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)i have looked around the code.
about replace_varno and ChangeVarNodes:ChangeVarNodes
have
````
if (IsA(node, RangeTblRef))
{
RangeTblRef *rtr = (RangeTblRef *) node;if (context->sublevels_up == 0 &&
rtr->rtindex == context->rt_index)
rtr->rtindex = context->new_index;
/* the subquery itself is visited separately */
return false;
}
````
if ChangeVarNodes executed the above code in remove_useless_self_joins and
remove_self_joins_recurse. the joinlist(RangeTblRef) will change from (1,2)
to (2,2). then later, remove_rel_from_joinlist cannot remove the 1,
*nremoved will be zero.
then the below code error branch will be executed.
````
joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
if (nremoved != 1)
elog(ERROR, "failed to find relation %d in joinlist", relid);
```Did you manage to overcome this problem in your patch? If not, why do
regression tests pass while this seems to affect pretty much every
self-join removal? If so, how did you do that?
in remove_self_join_rel, i have
```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
Immediately after this call, I wrote a function (restore_rangetblref)
to restore the joinlist as original (1,2).
then remove_rel_from_joinlist won't error out.
see remove_self_join_rel, restore_rangetblref.
Andrei Lepikhov:
+ /* Replace varno in all the query structures */
+ replace_varno((Node *) root->parse, toRemove->relid, toKeep->relid);
So Andrei Lepikhov's change didn't touch joinlist,
Query->resultRelation, Query->mergeTargetRelation.
Then in v3-0002 I tried to make SJE work with UPDATE, i thought it worked well,
because ChangeVarNodes also takes care of Query->resultRelation,
Query->mergeTargetRelation.
then later your EPQ demenonsate shows that's not enough.
so, in summary, in v3-0001, by changing all replace_varno to ChangeVarNodes
paves ways to make SJE apply to UPDATE/DELETE/MERGE.
It's just that we need to reverse some effects of ChangeVarNodes.
(restore_rangetblref)
Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.It appears you didn't try to address the EPQ problem, which seems to
me even more serious than the RETURNING problem.See the following example.
Session 1
# create table test (id int primary key, val int);
# insert into test values (1,1);
# begin;
# update test set val = val + 1 where id = 1;Session 2
# update test set val = t.val + 1 from test t where test.id = t.id;
(wait)Session 1
# commit;
current mechanism, in this example context,
SJE can translate ```update test set val = t.val + 1 from test t where
test.id = t.id;``` as good as to
```update test set val = val + 1```.
if we replace it that way, then this example would result val = 3.
but without SJE,
```update test set val = t.val + 1 from test t where test.id = t.id;```
will result val = 2.
you mentioned the EPQ problem, previously i don't know what that means.
now i see, I feel like it is quite challenging to resolve it.
On Thu, Jul 4, 2024 at 5:15 AM jian he <jian.universality@gmail.com> wrote:
in remove_self_join_rel, i have
```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
Immediately after this call, I wrote a function (restore_rangetblref)
to restore the joinlist as original (1,2).
then remove_rel_from_joinlist won't error out.
see remove_self_join_rel, restore_rangetblref.
Thank you, now this is clear. Could we add additional parameters to
ChangeVarNodes() instead of adding a new function which reverts part
of changes.
current mechanism, in this example context,
SJE can translate ```update test set val = t.val + 1 from test t where
test.id = t.id;``` as good as to
```update test set val = val + 1```.
if we replace it that way, then this example would result val = 3.but without SJE,
```update test set val = t.val + 1 from test t where test.id = t.id;```
will result val = 2.you mentioned the EPQ problem, previously i don't know what that means.
Yes, I guessed so. I should have come with more detailed explanation.
now i see, I feel like it is quite challenging to resolve it.
Yep. Glad to see we are on the same page.
This is why I think we could leave SJE for target relation of
modification queries for future. I'd like to not devalue SELECT-only
SJE, given that this is a step forward anyway.
------
Regards,
Alexander Korotkov
Supabase
On Thu, Jul 4, 2024 at 11:04 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Jul 4, 2024 at 5:15 AM jian he <jian.universality@gmail.com> wrote:
in remove_self_join_rel, i have
```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
Immediately after this call, I wrote a function (restore_rangetblref)
to restore the joinlist as original (1,2).
then remove_rel_from_joinlist won't error out.
see remove_self_join_rel, restore_rangetblref.Thank you, now this is clear. Could we add additional parameters to
ChangeVarNodes() instead of adding a new function which reverts part
of changes.
I didn't dare to. we have 42 occurrences of ChangeVarNodes.
adding a parameter to it only for one location seems not intuitive.
Now I have tried.
changing to
`ChangeVarNodes(Node *node, int rt_index, int new_index, int
sublevels_up, bool change_RangeTblRef)`
/* Replace varno in all the query structures */
ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
```
it seems to work, pass the regression test.
```ChangeVarNodes((Node *) root->parse, toRemove->relid,
toKeep->relid, 0, false);```
is in remove_self_join_rel, remove_self_joins_one_group,
remove_self_joins_recurse.
all other places are ```ChangeVarNodes((Node *) root->parse,
toRemove->relid, toKeep->relid, 0, true);```
so ChangeVarNodes add a parameter will only influence the SJE feature.
I also tried, save to a temp list, but it did not work.
original_fromlist = list_copy_deep(root->parse->jointree->fromlist);
/* Replace varno in all the query structures */
ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);
root->parse->jointree->fromlist = list_copy(original_fromlist);
On Thu, Jul 4, 2024 at 11:40 AM jian he <jian.universality@gmail.com> wrote:
On Thu, Jul 4, 2024 at 11:04 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Thu, Jul 4, 2024 at 5:15 AM jian he <jian.universality@gmail.com> wrote:
in remove_self_join_rel, i have
```ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);```
which will change the joinlist(RangeTblRef) from (1,2) to (2,2).
Immediately after this call, I wrote a function (restore_rangetblref)
to restore the joinlist as original (1,2).
then remove_rel_from_joinlist won't error out.
see remove_self_join_rel, restore_rangetblref.Thank you, now this is clear. Could we add additional parameters to
ChangeVarNodes() instead of adding a new function which reverts part
of changes.I didn't dare to. we have 42 occurrences of ChangeVarNodes.
adding a parameter to it only for one location seems not intuitive.Now I have tried.
changing to
`ChangeVarNodes(Node *node, int rt_index, int new_index, int
sublevels_up, bool change_RangeTblRef)`/* Replace varno in all the query structures */
ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
```it seems to work, pass the regression test.
```ChangeVarNodes((Node *) root->parse, toRemove->relid,
toKeep->relid, 0, false);```
is in remove_self_join_rel, remove_self_joins_one_group,
remove_self_joins_recurse.
all other places are ```ChangeVarNodes((Node *) root->parse,
toRemove->relid, toKeep->relid, 0, true);```
so ChangeVarNodes add a parameter will only influence the SJE feature.
Good. But I think it's not necessary to to replace function signature
in all the 42 occurrences. This will make our patch unnecessarily
conflict with others. Instead we can have two functions
ChangeVarNodes(original function signature) and
ChangeVarNodesExtended(extended function signature). Then existing
occurrences can still use ChangeVarNodes(), which will be just
shortcut for ChangeVarNodesExtended().
------
Regards,
Alexander Korotkov
Supabase
On 7/2/24 07:25, jian he wrote:
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.
Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.
Looking into the 0002 and 0003 patches, I think they 1) should be merged
and 2) It makes sense to use the already existing pull_varnos_of_level
routine instead of a new walker. See the patches in the attachment as a
sketch.
Also, I'm not sure about the tests. It looks like we have a lot of new
tests.
However, the main issue mentioned above is the correctness of relid
replacement in planner structures.
We have the machinery to check and replace relids in a Query. But
PlannerInfo is a bin for different stuff that the optimisation needs to
convert the parse tree to the corresponding cloud of paths.
A good demo of the problem is the introduction of the JoinDomain structure:
It contains a relids field and has no tests for that. We haven't known
for a long time about the issue of SJE not replacing the relid in this
structure.
The approach with 'Relation Alias' mentioned by Alexander raises many
hard questions about accessing simple_rel_array directly or, much worse,
about checking the scope of some clause where we didn't touch
RelOptInfo, just compare two relids fields.
The safest decision would be to restart query planning over parse tree
with removed self-joins, but now planner code isn't ready for that yet.
But maybe we should put this problem on the shoulders of a developer and
made something like with nodes: perl script which will generate walker
switch over PlannerInfo structure?
--
regards, Andrei Lepikhov
Attachments:
v4-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v4-0001-Remove-useless-self-joins.patchDownload
From 52f575b72ceaf6dd29e1f826951416965e5b348b Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 9 Jul 2024 10:34:20 +0700
Subject: [PATCH 1/2] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1228 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 95 ++
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 2 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 ++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2970 insertions(+), 83 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f627a3e63c..39ff4d8692 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5612,6 +5612,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d806326e..b186372681 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -856,7 +856,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, exprvars)
{
if (!is_exprlist_member(lfirst(lc2), exprs))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78d..2230b13104 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81f8..bb14597762 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -29,12 +30,31 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +340,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +382,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -434,10 +449,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +462,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +911,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +931,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1249,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1290,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1328,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1342,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1397,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1428,987 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+typedef struct
+{
+ int rt_index;
+ int new_index;
+ int cnt;
+ int sublevels_up;
+} restore_rangetblref_context;
+
+static bool
+restore_rangetblref_walker(Node *node, restore_rangetblref_context *context)
+{
+ if (node == NULL)
+ return false;
+
+ if (IsA(node, RangeTblRef))
+ {
+ RangeTblRef *rtr = (RangeTblRef *) node;
+
+ if (context->sublevels_up == 0 && context->cnt == 0 &&
+ rtr->rtindex == context->rt_index)
+ {
+ rtr->rtindex = context->new_index;
+ context->cnt++;
+ }
+ //TODO, here, should we return false immediately.
+ // return false;
+ }
+
+ if (IsA(node, Query))
+ {
+ /* Recurse into subselects */
+ bool result;
+
+ context->sublevels_up++;
+ result = query_tree_walker((Query *) node, restore_rangetblref_walker,
+ (void *) context, 0);
+ context->sublevels_up--;
+ return result;
+ }
+ return expression_tree_walker(node, restore_rangetblref_walker,
+ (void *) context);
+}
+
+/*
+ * in remove_self_join_rel, ChangeVarNodes may change some of
+ * the RangeTblRef->rtindex in the root Query, here we do the change it back.
+ *
+ * e.g. ChangeVarNodes maybe change
+ * root->parse subnode JoinExpr's larg, rarg (RangeTblRef->rtindex)
+ * from (1,2) to (2,2), we need to change it back to (1,2).
+ * For node JoinExpr, the query tree WALK logic is through larg,then rarg,
+ * we will change the first "2" to "1", not the second.
+ *
+*/
+static void
+restore_rangetblref(Node *node, int rt_index, int new_index, int cnt, int sublevels_up)
+{
+ restore_rangetblref_context context;
+ context.rt_index = rt_index;
+ context.new_index = new_index;
+ context.cnt = cnt;
+ context.sublevels_up = sublevels_up;
+
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+ query_tree_walker(qry, restore_rangetblref_walker,
+ (void *) &context, 0);
+ }
+ else
+ restore_rangetblref_walker(node, &context);
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures */
+ ChangeVarNodes((Node *) root->parse, toRemove->relid, toKeep->relid, 0);
+ /* restore the rangetblref in a proper order. */
+ restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
+
+ /* See remove_self_joins_one_group() */
+ Assert(root->parse->resultRelation != toRemove->relid);
+ Assert(root->parse->resultRelation != toKeep->relid);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ /*
+ * We don't accept result relation as either source or target relation
+ * of SJE, because result relation has different behavior in
+ * EvalPlanQual() and RETURNING clause.
+ */
+ if (root->parse->resultRelation == r)
+ continue;
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ if (root->parse->resultRelation == k)
+ continue;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..075d36c7ec 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e97e..c5a8d00bb3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0b1..bc04bc0bb6 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -540,6 +540,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (newId < 0)
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -637,6 +664,74 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, later processing it will be
+ * optimized out, but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 630ed0f162..40db9d25b4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1..56b34ae3e3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 7b63c5cf71..4381124f6e 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c0224a..fff4b69dfd 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..f2e3fa4c2e 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049e8..11593d36f5 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,6 +42,7 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fe..3d5de28354 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 53f70d72ed..4e4cec633a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6213,6 +6213,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 729620de13..9da8c5013c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a3105..77dd964ebf 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d81ff63be5..3e94e0af53 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2345,6 +2345,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 635e6d6e21..74091ba48f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -390,6 +390,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2565,6 +2566,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3997,6 +3999,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.2
v4-0002-Apply-SJE-to-DML-queries-with-RETURNING-clause.patchtext/x-patch; charset=UTF-8; name=v4-0002-Apply-SJE-to-DML-queries-with-RETURNING-clause.patchDownload
From 3188ff2ee9db2f67290594fd7cdad0d3b1e5359d Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 9 Jul 2024 12:25:23 +0700
Subject: [PATCH 2/2] Apply SJE to DML queries with RETURNING clause
---
src/backend/optimizer/plan/analyzejoins.c | 32 +-
src/test/regress/expected/join.out | 687 +++++++++++++++++-
src/test/regress/expected/updatable_views.out | 17 +-
src/test/regress/sql/join.sql | 270 +++++++
4 files changed, 963 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bb14597762..7fc59d80cf 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1860,10 +1860,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* restore the rangetblref in a proper order. */
restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
- /* See remove_self_joins_one_group() */
- Assert(root->parse->resultRelation != toRemove->relid);
- Assert(root->parse->resultRelation != toKeep->relid);
-
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@@ -2046,14 +2042,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
{
RelOptInfo *inner = root->simple_rel_array[r];
- /*
- * We don't accept result relation as either source or target relation
- * of SJE, because result relation has different behavior in
- * EvalPlanQual() and RETURNING clause.
- */
- if (root->parse->resultRelation == r)
- continue;
-
k = r;
while ((k = bms_next_member(relids, k)) > 0)
@@ -2069,9 +2057,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
PlanRowMark *imark = NULL;
List *uclauses = NIL;
- if (root->parse->resultRelation == k)
- continue;
-
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[k]->relid ==
root->simple_rte_array[r]->relid);
@@ -2300,6 +2285,23 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
relids = bms_del_members(relids, group);
+ /*
+ * if one of relations is in the RETURNING statement
+ * we can only apply this optimisation if returning list
+ * contains only resultRelation relid (we don't apply it to
+ * partitioned relations so far) or doesn't contain result
+ * relation at all.
+ */
+ if (root->parse && root->parse->returningList != NIL)
+ {
+ Relids rets = pull_varnos_of_level(
+ root, (Node *) root->parse->returningList, 0);
+
+ if (bms_is_member(root->parse->resultRelation, rets) &&
+ bms_num_members(rets) != 1)
+ continue;
+ }
+
/*
* Try to remove self-joins from a group of identical entries.
* Make the next attempt iteratively - if something is deleted
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4e4cec633a..b889b2889c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6440,6 +6440,663 @@ on true;
-> Seq Scan on int8_tbl y
(7 rows)
+--truncate sj, refill data.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+------------------- UPDATE SJE (self join elimination) applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: ((a IS NOT NULL) AND ((b = 2) OR (a = 2)))
+(3 rows)
+
+UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+ QUERY PLAN
+-----------------------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: ((b IS NOT NULL) AND (a = 3))
+(3 rows)
+
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+ QUERY PLAN
+-------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a = 4)
+(3 rows)
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+select * from sj order by a;
+ a | b | c
+----+---+----
+ 1 | 1 | 11
+ 2 | 4 | 12
+ 5 | 5 | 15
+ 6 | 6 | 16
+ 7 | 7 | 17
+ 8 | 8 | 18
+ 9 | 9 | 19
+ 10 | 0 | 20
+ 12 | 4 | 14
+ 19 | 3 | 13
+(10 rows)
+
+------------------- UPDATE SJE not applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a or (sq.b = 2 or sq.a = 2);
+ QUERY PLAN
+------------------------------------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: ((sz.a = sq.a) OR (sq.b = 2) OR (sq.a = 2))
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+ QUERY PLAN
+------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = b)
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+ QUERY PLAN
+------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = 3)
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+(7 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 2 and sj.a = 3;
+ QUERY PLAN
+--------------------------------------
+ Update on sj
+ -> Nested Loop
+ Join Filter: (sj.b = sj_1.b)
+ -> Seq Scan on sj
+ Filter: (a = 3)
+ -> Seq Scan on sj sj_1
+ Filter: (a = 2)
+(7 rows)
+
+------------------- DELETE SJE applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+ QUERY PLAN
+-------------------------
+ Delete on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a = 5)
+(3 rows)
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+ QUERY PLAN
+--------------------------------------------------------------
+ Delete on sj sz
+ -> Seq Scan on sj sz
+ Filter: ((a IS NOT NULL) AND ((a = 6) OR (b = 113)))
+(3 rows)
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+ QUERY PLAN
+-----------------------------------------------
+ Delete on sj t1
+ -> Seq Scan on sj t1
+ Filter: ((b IS NOT NULL) AND (a = 7))
+(3 rows)
+
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+ QUERY PLAN
+-------------------------
+ Delete on sj
+ -> Seq Scan on sj
+ Filter: (a = 8)
+(3 rows)
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+select * from sj order by a;
+ a | b | c
+----+---+----
+ 1 | 1 | 11
+ 2 | 4 | 12
+ 9 | 9 | 19
+ 10 | 0 | 20
+ 12 | 4 | 14
+ 19 | 3 | 13
+(6 rows)
+
+------------------- DELETE SJE not applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = b)
+ -> Seq Scan on sj sz
+(6 rows)
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = sz.b)
+ -> Seq Scan on sj sq
+ Filter: (a = 3)
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+(7 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.b = t1.b)
+ -> Seq Scan on sj t1
+ Filter: (a = 3)
+ -> Seq Scan on sj sq
+(6 rows)
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+ QUERY PLAN
+-------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.c = t1.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj t1
+(6 rows)
+
+---UPDATE/DELETE RETURNING SJE applicable cases.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+----test query that can use SJE
+explain(costs off)
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+ QUERY PLAN
+-------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a = 2)
+ SubPlan 1
+ -> Result
+(5 rows)
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+ QUERY PLAN
+-------------------------
+ Update on sj s2
+ -> Seq Scan on sj s2
+ Filter: (a = 3)
+(3 rows)
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+ QUERY PLAN
+------------------------------------
+ Update on sj s1
+ -> Nested Loop
+ Join Filter: (s1.b = s2.b)
+ -> Seq Scan on sj s1
+ Filter: (a = 4)
+ -> Seq Scan on sj s2
+ SubPlan 1
+ -> Result
+(8 rows)
+
+explain(costs off)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+ QUERY PLAN
+-------------------------
+ Delete on sj
+ -> Seq Scan on sj
+ Filter: (a = 5)
+ SubPlan 1
+ -> Result
+(5 rows)
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ QUERY PLAN
+------------------------------------
+ Delete on sj s1
+ -> Nested Loop
+ Join Filter: (s1.b = s2.b)
+ -> Seq Scan on sj s1
+ Filter: (a = 6)
+ -> Seq Scan on sj s2
+(6 rows)
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ QUERY PLAN
+------------------------------------
+ Delete on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = s2.b)
+ -> Seq Scan on sj s2
+ Filter: (a = 7)
+ -> Seq Scan on sj sq
+(6 rows)
+
+----actually execute the query to validate the result
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+ a | b | c
+----+----+----
+ 17 | 17 | 32
+(1 row)
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+ a | b | c
+----+----+----
+ 18 | 18 | 38
+(1 row)
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+ a | b | c | a
+----+----+----+----
+ 19 | 19 | 44 | 19
+(1 row)
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+ a | ?column? | ?column?
+---+----------+----------
+ 5 | 25 | 10
+(1 row)
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ a | b | c | tbl
+---+---+----+-----
+ 6 | 6 | 16 | sj
+(1 row)
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+ a | b | c | tbl
+---+---+----+-----
+ 7 | 7 | 17 | sj
+(1 row)
+
+select * from sj order by a;
+ a | b | c
+----+----+----
+ 1 | 1 | 11
+ 8 | 8 | 18
+ 9 | 9 | 19
+ 10 | 0 | 20
+ 17 | 17 | 32
+ 18 | 18 | 38
+ 19 | 19 | 44
+(7 rows)
+
+------------------- MERGE SJE table setup
+CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+create or replace view no_rw_sj_target as select * from sj_target where balance = 60 limit 1;
+--cannot use SJE for RETURNING
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 11 RETURNING *;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+--cannot use SJE for merge insert
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN NOT MATCHED AND s.balance = 10
+ THEN INSERT VALUES (s.tid, s.balance);
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target s
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(5 rows)
+
+--cannot use SJE for merge with non-updatable view
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60
+ THEN update set balance = t.balance + 6;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Subquery Scan on s
+ -> Limit
+ -> Seq Scan on sj_target
+ Filter: (balance = 60)
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(8 rows)
+
+---- the following cases can apply SJE with merge.
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 2;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN update set balance = t.balance + 3;
+ QUERY PLAN
+-------------------------------------------------
+ Merge on sj_target
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(5 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40
+ THEN update set balance = t.balance + 4;
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+(2 rows)
+
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN DELETE;
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+(2 rows)
+
+--- and run the actual query.
+MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60 THEN update set balance = t.balance + 6;
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20 THEN update set balance = t.balance + 2;
+MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30 THEN update set balance = t.balance + 3;
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 THEN update set balance = t.balance + 4;
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50 THEN DELETE;
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
+select * from sj_target order by tid;
+ tid | balance
+-----+---------
+ 1 | 10
+ 2 | 22
+ 3 | 33
+ 4 | 44
+ 6 | 66
+(5 rows)
+
+-------------------MERGRE RETURNING SJE TEST--------------------------------
+TRUNCATE sj_target;
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+--"when matched" and "when not matched" both specified
+--because of "when not matched" then can only use left join
+--therefore cannot apply SJE
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance, tid = t.tid + s.tid
+ WHEN NOT MATCHED BY SOURCE AND t.balance = 20 THEN DELETE
+ RETURNING t.*, (select t.balance), merge_action();
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+ SubPlan 1
+ -> Result
+(7 rows)
+
+---"when not matched" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+WHEN NOT MATCHED BY SOURCE AND t.balance = 10 THEN
+ DELETE
+returning t.*;
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+---"when not matched by target" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target s ON t.tid = s.tid
+ WHEN NOT MATCHED THEN
+ INSERT VALUES (s.tid, 11)
+ RETURNING t.*, merge_action();
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop Left Join
+ -> Seq Scan on sj_target s
+ -> Index Scan using sj_target_pkey on sj_target t
+ Index Cond: (tid = s.tid)
+(5 rows)
+
+---returning with multiple rel, merge returning cannot be applied.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN UPDATE SET balance = t.balance + s.balance + 50
+ RETURNING s.*, t.balance, merge_action();
+ QUERY PLAN
+------------------------------------------------------------
+ Merge on sj_target t
+ -> Nested Loop
+ -> Seq Scan on sj_target t
+ -> Index Scan using sj_target_pkey on sj_target s
+ Index Cond: (tid = t.tid)
+(5 rows)
+
+--------the following (explain) query can use SJE.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+ QUERY PLAN
+-------------------------------
+ Merge on sj_target s
+ -> Seq Scan on sj_target s
+ SubPlan 1
+ -> Result
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ QUERY PLAN
+-------------------------------------------------------------
+ Merge on sj_target s0
+ -> Nested Loop
+ Join Filter: (s0.balance = s1.balance)
+ -> Seq Scan on sj_target s1
+ -> Materialize
+ -> Bitmap Heap Scan on sj_target s0
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(9 rows)
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ QUERY PLAN
+-------------------------------------------------------------
+ Merge on sj_target
+ -> Nested Loop
+ Join Filter: (s0.balance = sj_target.balance)
+ -> Seq Scan on sj_target s0
+ -> Materialize
+ -> Bitmap Heap Scan on sj_target
+ Recheck Cond: (tid >= 2)
+ -> Bitmap Index Scan on sj_target_pkey
+ Index Cond: (tid >= 2)
+(9 rows)
+
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+ QUERY PLAN
+------------------------------------------------
+ Merge on sj_target s0
+ -> Nested Loop
+ Join Filter: (s0.balance = s1.balance)
+ -> Seq Scan on sj_target s0
+ -> Materialize
+ -> Seq Scan on sj_target s1
+ SubPlan 1
+ -> Result
+(8 rows)
+
+--------and actually running query validate the result.
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+ tid | balance | balance | merge_action
+-----+---------+---------+--------------
+ 12 | 120 | 120 | UPDATE
+(1 row)
+
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ tid | balance | merge_action
+-----+---------+--------------
+ 14 | 140 | UPDATE
+(1 row)
+
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+ tid | balance | merge_action
+-----+---------+--------------
+ 16 | 160 | UPDATE
+(1 row)
+
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+ tid | balance | tableoid | merge_action
+-----+---------+-----------+--------------
+ 18 | 180 | sj_target | UPDATE
+(1 row)
+
+select * from sj_target;
+ tid | balance
+-----+---------
+ 5 | 50
+ 6 | 60
+ 12 | 120
+ 14 | 140
+ 16 | 160
+ 18 | 180
+(6 rows)
+
+DROP VIEW no_rw_sj_target;
+DROP VIEW rw_sj_target;
+DROP TABLE sj_target;
+truncate sj;
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
@@ -7058,29 +7715,23 @@ WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
TRUNCATE emp1;
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
- QUERY PLAN
--------------------------------------
- Update on sj sq
- -> Nested Loop
- Join Filter: (sq.a = sz.a)
- -> Seq Scan on sj sq
- -> Materialize
- -> Seq Scan on sj sz
-(6 rows)
+ QUERY PLAN
+---------------------------------
+ Update on sj sz
+ -> Seq Scan on sj sz
+ Filter: (a IS NOT NULL)
+(3 rows)
CREATE RULE sj_del_rule AS ON DELETE TO sj
DO INSTEAD
UPDATE sj SET a = 1 WHERE a = old.a;
EXPLAIN (COSTS OFF) DELETE FROM sj;
- QUERY PLAN
---------------------------------------
- Update on sj sj_1
- -> Nested Loop
- Join Filter: (sj.a = sj_1.a)
- -> Seq Scan on sj sj_1
- -> Materialize
- -> Seq Scan on sj
-(6 rows)
+ QUERY PLAN
+---------------------------------
+ Update on sj
+ -> Seq Scan on sj
+ Filter: (a IS NOT NULL)
+(3 rows)
DROP RULE sj_del_rule ON sj CASCADE;
-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9c21b76800..2c6487818a 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -3111,16 +3111,13 @@ SELECT * FROM rw_view1;
(1 row)
EXPLAIN (costs off) DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
- QUERY PLAN
--------------------------------------------------------------------
- Update on base_tbl base_tbl_1
- -> Nested Loop
- -> Index Scan using base_tbl_pkey on base_tbl base_tbl_1
- Index Cond: (id = 1)
- -> Index Scan using base_tbl_pkey on base_tbl
- Index Cond: (id = 1)
- Filter: ((NOT deleted) AND snoop(data))
-(7 rows)
+ QUERY PLAN
+--------------------------------------------------
+ Update on base_tbl
+ -> Index Scan using base_tbl_pkey on base_tbl
+ Index Cond: (id = 1)
+ Filter: ((NOT deleted) AND snoop(data))
+(4 rows)
DELETE FROM rw_view1 WHERE id = 1 AND snoop(data);
NOTICE: snooped value: Row 1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e94e0af53..981f9a8c47 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2450,6 +2450,276 @@ left join (select coalesce(y.q1, 1) from int8_tbl y
on true) z
on true;
+--truncate sj, refill data.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+------------------- UPDATE SJE (self join elimination) applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+
+UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a and (sq.b = 2 or sq.a = 2);
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+
+WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + t1.a + t1.c FROM t1 WHERE t1.b = sj.b and t1.a = 3 and sj.a = 3;
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) UPDATE sj SET a = sj.a + t1.b + t1.a FROM t1 WHERE t1.a = sj.a and t1.a = 4;
+
+select * from sj order by a;
+------------------- UPDATE SJE not applicable cases.
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sz.a = sq.a or (sq.b = 2 or sq.a = 2);
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+
+EXPLAIN (COSTS OFF) UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) UPDATE sj SET a = sj.a + 1 FROM t1 WHERE t1.b = sj.b and t1.a = 2 and sj.a = 3;
+
+------------------- DELETE SJE applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and sq.a = 5;
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+
+DELETE FROM sj sq using sj as sz WHERE sz.a = sq.a and (sz.a = 6 or sz.b = 113);
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+
+WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 7 and sq.a = 7;
+
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 8;
+
+select * from sj order by a;
+------------------- DELETE SJE not applicable cases.
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.a = sz.b and sq.b = sz.b and sz.b = sq.a;
+
+EXPLAIN (COSTS OFF) DELETE FROM sj sq USING sj as sz WHERE sq.b = sz.b and sz.a = 2 and sq.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.b = sq.b and t1.a = 3;
+
+EXPLAIN (COSTS OFF) WITH t1 AS (SELECT * FROM sj) DELETE FROM sj sq USING sj as t1 WHERE t1.a = sq.c;
+
+---UPDATE/DELETE RETURNING SJE applicable cases.
+truncate sj;
+insert into sj(a,b,c) select g, g % 10, g + 10 from generate_series(1, 10) g;
+analyze sj;
+
+----test query that can use SJE
+explain(costs off)
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+
+explain(costs off)
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+
+explain(costs off)
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+explain(costs off)
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+----actually execute the query to validate the result
+UPDATE sj sq
+SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM sj sz WHERE sq.a = sz.a and sz.a = 2 and sq.a = 2
+returning sq.a, sq.b, (select sq.c);
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.a and sz.a = 3
+returning sq.*;
+
+UPDATE sj sq SET c = sz.a + sq.a + sz.b + sq.b + sz.c + sq.c, a = sz.a + 15, b = sq.a + 15
+FROM (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sz.a = 4 and sq.a = 4
+returning sq.*, (select sq.a);
+
+WITH t1 AS (SELECT *, (select count(*) from sj) FROM sj) DELETE FROM sj sq using t1 as sz where sq.a = sz.a and sz.a = 5
+returning sq.a, sq.a * sq.b, (select sq.a + sq.b);
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.b = s2.b) as sz
+WHERE sq.a = sz.a and sq.a = 6
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+delete from sj sq using (select s1.b, s1.a as a, s2.a as a1, s2.c from sj s1 join sj s2 on s1.a = s2.a) as sz
+WHERE sq.a = sz.b and sz.a = 7
+returning sq.a, sq.b, sq.c, sq.tableoid::regclass as tbl;
+
+select * from sj order by a;
+------------------- MERGE SJE table setup
+CREATE TABLE sj_target (tid integer primary key, balance integer) WITH (autovacuum_enabled=off);
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+create view rw_sj_target as select * from sj_target where tid >= 2;
+create or replace view no_rw_sj_target as select * from sj_target where balance = 60 limit 1;
+
+--cannot use SJE for RETURNING
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 11 RETURNING *;
+--cannot use SJE for merge insert
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN NOT MATCHED AND s.balance = 10
+ THEN INSERT VALUES (s.tid, s.balance);
+
+--cannot use SJE for merge with non-updatable view
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60
+ THEN update set balance = t.balance + 6;
+
+---- the following cases can apply SJE with merge.
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20
+ THEN update set balance = t.balance + 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN update set balance = t.balance + 2;
+EXPLAIN (COSTS OFF) MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN update set balance = t.balance + 3;
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40
+ THEN update set balance = t.balance + 4;
+EXPLAIN (COSTS OFF) MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN DELETE;
+
+--- and run the actual query.
+MERGE INTO sj_target t USING no_rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 60 THEN update set balance = t.balance + 6;
+
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 20 THEN update set balance = t.balance + 2;
+
+MERGE INTO rw_sj_target t USING rw_sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30 THEN update set balance = t.balance + 3;
+
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 THEN update set balance = t.balance + 4;
+
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50 THEN DELETE;
+
+MERGE INTO rw_sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10 THEN update set balance = t.balance + 1;
+
+select * from sj_target order by tid;
+-------------------MERGRE RETURNING SJE TEST--------------------------------
+TRUNCATE sj_target;
+INSERT INTO sj_target VALUES (1, 10),(2, 20), (3, 30), (4, 40),(5, 50), (6, 60);
+
+--"when matched" and "when not matched" both specified
+--because of "when not matched" then can only use left join
+--therefore cannot apply SJE
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance, tid = t.tid + s.tid
+ WHEN NOT MATCHED BY SOURCE AND t.balance = 20 THEN DELETE
+ RETURNING t.*, (select t.balance), merge_action();
+
+---"when not matched" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+WHEN NOT MATCHED BY SOURCE AND t.balance = 10 THEN
+ DELETE
+returning t.*;
+
+---"when not matched by target" using left join, SJE cannot be applied
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target s ON t.tid = s.tid
+ WHEN NOT MATCHED THEN
+ INSERT VALUES (s.tid, 11)
+ RETURNING t.*, merge_action();
+
+---returning with multiple rel, merge returning cannot be applied.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 50
+ THEN UPDATE SET balance = t.balance + s.balance + 50
+ RETURNING s.*, t.balance, merge_action();
+
+--------the following (explain) query can use SJE.
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+EXPLAIN (COSTS OFF)
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+EXPLAIN (COSTS OFF)
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+--------and actually running query validate the result.
+MERGE INTO sj_target t USING sj_target AS s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 10
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (select t.balance), merge_action();
+
+MERGE INTO rw_sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND s.balance = 20
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+MERGE INTO rw_sj_target t USING (select s1.tid, s1.balance from sj_target s0 join rw_sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 30
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, merge_action();
+
+MERGE INTO sj_target t USING (select s0.tid, s1.balance from sj_target s0 join sj_target s1 on s0.balance = s1.balance) s ON t.tid = s.tid
+ WHEN MATCHED AND t.balance = 40 AND s.balance = 40
+ THEN UPDATE SET balance = t.balance + s.balance + 100, tid = t.tid + s.tid + 10
+ RETURNING t.*, (SELECT t.tableoid::regclass), merge_action();
+
+select * from sj_target;
+
+DROP VIEW no_rw_sj_target;
+DROP VIEW rw_sj_target;
+DROP TABLE sj_target;
+
+truncate sj;
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
--
2.39.2
On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/2/24 07:25, jian he wrote:
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.
thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.
also thanks to Alexander Korotkov's tip.
I added a bool change_RangeTblRef to ChangeVarNodes_context.
default is true, so won't influence ChangeVarNodes.
After that create a function ChangeVarNodesExtended.
so now replace_varno replaced by ChangeVarNodes.
now in ChangeVarNodes_walker we've add:
```
if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
int relid = -1;
bool is_req_equal =
(rinfo->required_relids == rinfo->clause_relids);
bool clause_relids_is_multiple =
(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
...
}
```
but this part, we don't have much comments, adding some comments would be good.
but I am not sure how.
static bool
match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
Index relid)
but actually we call it via:
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
I am not sure whether the second argument is "inner" or "outer".
Maybe it will cause confusion.
same with innerrel_is_unique_ext.
/*
* At this stage, joininfo lists of inner and outer can contain
* only clauses, required for a superior outer join that can't
* influence this optimization. So, we can avoid to call the
* build_joinrel_restrictlist() routine.
*/
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);
build_joinrel_restrictlist require joinrel, innerrel, outrel, but here
we only have innerrel, outterrel.
so i am confused with the comments.
i add following code snippets after generate_join_implied_equalities
```
if (restrictlist == NIL)
continue
```
I have some confusion with the comments.
/*
* Determine if the inner table can duplicate outer rows. We must
* bypass the unique rel cache here since we're possibly using a
* subset of join quals. We can use 'force_cache' == true when all
* join quals are self-join quals. Otherwise, we could end up
* putting false negatives in the cache.
*/
if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
outer, JOIN_INNER, selfjoinquals,
list_length(otherjoinquals) == 0,
&uclauses))
continue;
"unique rel cache", not sure the meaning, obviously, "relcache" has a
different meaning.
so i am slightly confused with
"We must bypass the unique rel cache here since we're possibly using a
subset of join quals"
i have refactored comments below
```
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
```.
please check v5-0002 for comments refactor.
Attachments:
v5-0002-refactor-comments-in-remove_self_joins_one_group.patchtext/x-patch; charset=US-ASCII; name=v5-0002-refactor-comments-in-remove_self_joins_one_group.patchDownload
From a3b12e6b76205de7063298bf28e12e0ead8879cf Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 11 Jul 2024 15:11:28 +0800
Subject: [PATCH v5 2/2] refactor comments in remove_self_joins_one_group
---
src/backend/optimizer/plan/analyzejoins.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 27bca772..1b9d66dd 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2096,9 +2096,13 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
continue;
/*
- * We have proven that for both relations, the same unique index
- * guarantees that there is at most one row where columns equal
- * given values. These values must be the same for both relations,
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
* or else we won't match the same row on each side of the join.
*/
if (!match_unique_clauses(root, inner, uclauses, outer->relid))
--
2.34.1
v5-0001-Remove-useless-self-joins.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Remove-useless-self-joins.patchDownload
From 053554613576f60f8503b06d4a57925c421a9f9b Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Thu, 11 Jul 2024 15:32:19 +0800
Subject: [PATCH v5 1/1] Remove useless self-joins
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1150 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 150 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 +++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 +++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2948 insertions(+), 84 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b14c5d81..27ec59c6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5612,6 +5612,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d80632..b1863726 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -856,7 +856,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, exprvars)
{
if (!is_exprlist_member(lfirst(lc2), exprs))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d7..2230b131 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81..27bca772 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -29,12 +30,31 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +340,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +382,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -434,10 +449,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +462,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +911,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * outer_exprs contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the outer_exprs, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +931,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1249,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1290,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1328,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1342,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1397,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1428,909 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+ ListCell *lc;
+ ListCell *lc1;
+
+ foreach(lc, ec->ec_members)
+ {
+ EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach(lc1, new_members)
+ {
+ EquivalenceMember *other = lfirst_node(EquivalenceMember, lc1);
+
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach(lc, ec->ec_sources)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach(lc1, new_sources)
+ {
+ RestrictInfo *other = lfirst_node(RestrictInfo, lc1);
+
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+ /* SJE only support for SELECT now, resultRelation should be 0. */
+ Assert(root->parse->resultRelation != toRemove->relid);
+ Assert(root->parse->resultRelation != toKeep->relid);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach(lc, joininfos)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach(lc, toRemove->baserestrictinfo)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach(lc, binfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->baserestrictinfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach(lc, jinfo_candidates)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ ListCell *olc = NULL;
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach(olc, toKeep->joininfo)
+ {
+ RestrictInfo *src = lfirst_node(RestrictInfo, olc);
+
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ ListCell *lc;
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach(lc, joinquals)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ ListCell *lc;
+
+ foreach(lc, uclauses)
+ {
+ RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+ ListCell *olc;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach(olc, outer->baserestrictinfo)
+ {
+ RestrictInfo *orinfo = lfirst_node(RestrictInfo, olc);
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ if (restrictlist == NIL)
+ continue;
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * We have proven that for both relations, the same unique index
+ * guarantees that there is at most one row where columns equal
+ * given values. These values must be the same for both relations,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (root->parse->commandType != CMD_SELECT) ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5..075d36c7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e9..c5a8d00b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0..00971090 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -540,6 +540,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (newId < 0)
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -548,6 +575,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -558,6 +586,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -590,7 +619,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -637,6 +666,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -670,6 +768,55 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
(void *) context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we ChangeVarNodes change varno for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node: RangeTblRef.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. iff in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
+void
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index, int sublevels_up, bool change_RangeTblRef)
+{
+ ChangeVarNodes_context context;
+ context.rt_index = rt_index;
+ context.new_index = new_index;
+ context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
+
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+ if (sublevels_up == 0)
+ {
+ ListCell *l;
+
+ if (qry->resultRelation == rt_index)
+ qry->resultRelation = new_index;
+
+ if (qry->mergeTargetRelation == rt_index)
+ qry->mergeTargetRelation = new_index;
+
+ if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
+ qry->onConflict->exclRelIndex = new_index;
+
+ foreach(l, qry->rowMarks)
+ {
+ RowMarkClause *rc = (RowMarkClause *) lfirst(l);
+
+ if (rc->rti == rt_index)
+ rc->rti = new_index;
+ }
+ }
+ query_tree_walker(qry, ChangeVarNodes_walker,
+ (void *) &context, 0);
+ }
+ else
+ ChangeVarNodes_walker(node, &context);
+}
void
ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
{
@@ -678,6 +825,7 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = true;
/*
* Must be prepared to start with a Query or a bare expression tree; if
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 630ed0f1..40db9d25 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1a..56b34ae3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 7b63c5cf..4381124f 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c022..fff4b69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc1737..f2e3fa4c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049..074847f1 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047..3d5de283 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 53f70d72..4e4cec63 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6213,6 +6213,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 729620de..9da8c501 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31..77dd964e 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d81ff63b..3e94e0af 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2345,6 +2345,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 635e6d6e..74091ba4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -390,6 +390,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2565,6 +2566,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3997,6 +3999,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.34.1
On 7/11/24 14:43, jian he wrote:
On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/2/24 07:25, jian he wrote:
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.
I would like to oppose here: IMO, it is just a mishap which we made
because of a long history of patch transformations. There we lost the
case where RowMark exists for only one of candidate relations.
Also, after review I think we don't need so many new tests. Specifically
for DML we already have one:
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
And we should just add something to elaborate it a bit.
See the patch in attachment containing my proposal to improve v4-0001
main SJE patch. I think it resolved the issue with EPQ assertion as well
as problems with returning value.
--
regards, Andrei Lepikhov
Attachments:
v4-0002-Apply-SJE-to-DML-queries-2.patchtext/x-patch; charset=UTF-8; name=v4-0002-Apply-SJE-to-DML-queries-2.patchDownload
From c04add30999ecd64c51bde7db56a6e5637c16c74 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 9 Jul 2024 12:25:23 +0700
Subject: [PATCH] Apply SJE to DML queries: Just don't include result relation
to the set of SJE candidates.
Also, fix the subtle bug with RowMarks.
---
src/backend/optimizer/plan/analyzejoins.c | 24 +++------
src/test/regress/expected/join.out | 61 +++++++++++++++++++++++
src/test/regress/sql/join.sql | 17 ++++++-
3 files changed, 84 insertions(+), 18 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index bb14597762..d2b9ba7c08 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1860,10 +1860,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/* restore the rangetblref in a proper order. */
restore_rangetblref((Node *) root->parse, toKeep->relid, toRemove->relid, 0, 0);
- /* See remove_self_joins_one_group() */
- Assert(root->parse->resultRelation != toRemove->relid);
- Assert(root->parse->resultRelation != toKeep->relid);
-
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@@ -2046,14 +2042,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
{
RelOptInfo *inner = root->simple_rel_array[r];
- /*
- * We don't accept result relation as either source or target relation
- * of SJE, because result relation has different behavior in
- * EvalPlanQual() and RETURNING clause.
- */
- if (root->parse->resultRelation == r)
- continue;
-
k = r;
while ((k = bms_next_member(relids, k)) > 0)
@@ -2069,9 +2057,6 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
PlanRowMark *imark = NULL;
List *uclauses = NIL;
- if (root->parse->resultRelation == k)
- continue;
-
/* A sanity check: the relations have the same Oid. */
Assert(root->simple_rte_array[k]->relid ==
root->simple_rte_array[r]->relid);
@@ -2121,7 +2106,8 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
if (omark && imark)
break;
}
- if (omark && imark && omark->markType != imark->markType)
+ if (((omark == NULL) ^ (imark == NULL)) ||
+ (omark && omark->markType != imark->markType))
continue;
/*
@@ -2231,7 +2217,8 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
*/
if (rte->rtekind == RTE_RELATION &&
rte->relkind == RELKIND_RELATION &&
- rte->tablesample == NULL)
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation)
{
Assert(!bms_is_member(varno, relids));
relids = bms_add_member(relids, varno);
@@ -2300,6 +2287,9 @@ remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
relids = bms_del_members(relids, group);
+ /* Don't apply SJE to result relation */
+ Assert(!bms_is_member(root->parse->resultRelation, group));
+
/*
* Try to remove self-joins from a group of identical entries.
* Make the next attempt iteratively - if something is deleted
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4e4cec633a..78dfcd4866 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7068,6 +7068,18 @@ UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
-> Seq Scan on sj sz
(6 rows)
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = sz.b FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
CREATE RULE sj_del_rule AS ON DELETE TO sj
DO INSTEAD
UPDATE sj SET a = 1 WHERE a = old.a;
@@ -7083,6 +7095,55 @@ EXPLAIN (COSTS OFF) DELETE FROM sj;
(6 rows)
DROP RULE sj_del_rule ON sj CASCADE;
+-- Allow SJE, remove (s2 JOIN s3).
+EXPLAIN (COSTS OFF)
+UPDATE sj s1 SET b = s2.b, a = s3.a FROM sj AS s2, sj AS s3
+WHERE s1.a = s2.a AND s1.a=s3.a RETURNING s1.*,s2.*,s3.*;
+ QUERY PLAN
+---------------------------------------------
+ Update on sj s1
+ -> Nested Loop
+ Join Filter: (s1.a = s3.a)
+ -> Seq Scan on sj s1
+ -> Materialize
+ -> Seq Scan on sj s3
+ Filter: (a IS NOT NULL)
+(7 rows)
+
+-- Allow SJE, but it is just impossible
+EXPLAIN (COSTS OFF)
+UPDATE sj s1 SET b = s2.b, a = s3.a FROM sj AS s2, sj AS s3
+WHERE s1.a = s2.a AND s1.b=s3.b;
+ QUERY PLAN
+-------------------------------------------------
+ Update on sj s1
+ -> Nested Loop
+ Join Filter: (s1.b = s3.b)
+ -> Seq Scan on sj s3
+ -> Materialize
+ -> Nested Loop
+ Join Filter: (s1.a = s2.a)
+ -> Seq Scan on sj s1
+ -> Materialize
+ -> Seq Scan on sj s2
+(10 rows)
+
+-- Allow SJE. One more shot for the case of subquery
+EXPLAIN (COSTS OFF)
+UPDATE sj s1 SET b = q.b, a = q.a FROM (
+ SELECT s2.a AS a, s3.b AS b FROM sj AS s2, sj AS s3
+ WHERE s2.a=s3.a) AS q WHERE s1.b=q.a;
+ QUERY PLAN
+---------------------------------------------
+ Update on sj s1
+ -> Nested Loop
+ Join Filter: (s3.a = s1.b)
+ -> Seq Scan on sj s1
+ -> Materialize
+ -> Seq Scan on sj s3
+ Filter: (a IS NOT NULL)
+(7 rows)
+
-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
insert into emp1 values (1, 1);
explain (costs off)
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e94e0af53..7b32a9bb95 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2724,13 +2724,28 @@ TRUNCATE emp1;
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
-
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = sz.b FROM sj as sz WHERE sq.a = sz.a;
CREATE RULE sj_del_rule AS ON DELETE TO sj
DO INSTEAD
UPDATE sj SET a = 1 WHERE a = old.a;
EXPLAIN (COSTS OFF) DELETE FROM sj;
DROP RULE sj_del_rule ON sj CASCADE;
+-- Allow SJE, remove (s2 JOIN s3).
+EXPLAIN (COSTS OFF)
+UPDATE sj s1 SET b = s2.b, a = s3.a FROM sj AS s2, sj AS s3
+WHERE s1.a = s2.a AND s1.a=s3.a RETURNING s1.*,s2.*,s3.*;
+-- Allow SJE, but it is just impossible
+EXPLAIN (COSTS OFF)
+UPDATE sj s1 SET b = s2.b, a = s3.a FROM sj AS s2, sj AS s3
+WHERE s1.a = s2.a AND s1.b=s3.b;
+-- Allow SJE. One more shot for the case of subquery
+EXPLAIN (COSTS OFF)
+UPDATE sj s1 SET b = q.b, a = q.a FROM (
+ SELECT s2.a AS a, s3.b AS b FROM sj AS s2, sj AS s3
+ WHERE s2.a=s3.a) AS q WHERE s1.b=q.a;
+
-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
insert into emp1 values (1, 1);
explain (costs off)
--
2.39.2
Hi, Andrei!
On Fri, Jul 12, 2024 at 6:05 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/11/24 14:43, jian he wrote:
On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/2/24 07:25, jian he wrote:
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.I would like to oppose here: IMO, it is just a mishap which we made
because of a long history of patch transformations. There we lost the
case where RowMark exists for only one of candidate relations.
Also, after review I think we don't need so many new tests. Specifically
for DML we already have one:EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;And we should just add something to elaborate it a bit.
See the patch in attachment containing my proposal to improve v4-0001
main SJE patch. I think it resolved the issue with EPQ assertion as well
as problems with returning value.
I tried this. I applied 0001 from [1] and 0002 from [2]. Then I
tried the concurrent test case [3]. It still fails with assert for
me. But assert and related stuff is the least problem. The big
problem, as described in [3], is semantical change in query. When EPQ
is applied, we fetch the latest tuple of the target relation
regardless snapshot. But for the self-joined relation we should still
use the snapshot-satisfying tuple. I don't see even attempt to
address this in your patch. And as I pointed before, this appears
quite complex.
Links.
1. /messages/by-id/96250a42-20e3-40f0-9d45-f53ae852f8ed@gmail.com
2. /messages/by-id/5b49501c-9cb3-4c5d-9d56-49704ff08143@gmail.com
3. /messages/by-id/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ@mail.gmail.com
------
Regards,
Alexander Korotkov
Supabase
On Fri, Jul 12, 2024 at 1:30 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Fri, Jul 12, 2024 at 6:05 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/11/24 14:43, jian he wrote:
On Tue, Jul 9, 2024 at 2:06 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/2/24 07:25, jian he wrote:
to make sure it's correct, I have added a lot of tests,
Some of this may be contrived, maybe some of the tests are redundant.Thanks for your job!
I passed through the patches and have some notes:
1. Patch 0001 has not been applied anymore since the previous week's
changes in the core. Also, there is one place with trailing whitespace.thanks.
because the previous thread mentioned the EPQ problem.
in remove_useless_self_joins, i make it can only process CMD_SELECT query.I would like to oppose here: IMO, it is just a mishap which we made
because of a long history of patch transformations. There we lost the
case where RowMark exists for only one of candidate relations.
Also, after review I think we don't need so many new tests. Specifically
for DML we already have one:EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;And we should just add something to elaborate it a bit.
See the patch in attachment containing my proposal to improve v4-0001
main SJE patch. I think it resolved the issue with EPQ assertion as well
as problems with returning value.I tried this. I applied 0001 from [1] and 0002 from [2]. Then I
tried the concurrent test case [3]. It still fails with assert for
me. But assert and related stuff is the least problem. The big
problem, as described in [3], is semantical change in query. When EPQ
is applied, we fetch the latest tuple of the target relation
regardless snapshot. But for the self-joined relation we should still
use the snapshot-satisfying tuple. I don't see even attempt to
address this in your patch. And as I pointed before, this appears
quite complex.
Oh, sorry, I used wrong binaries during the check. My test case works
correctly, because SJE doesn't apply to the target relation.
# explain update test set val = t.val + 1 from test t where test.id = t.id;
QUERY PLAN
-----------------------------------------------------------------------------
Update on test (cost=60.85..105.04 rows=0 width=0)
-> Hash Join (cost=60.85..105.04 rows=2260 width=16)
Hash Cond: (test.id = t.id)
-> Seq Scan on test (cost=0.00..32.60 rows=2260 width=10)
-> Hash (cost=32.60..32.60 rows=2260 width=14)
-> Seq Scan on test t (cost=0.00..32.60 rows=2260 width=14)
(6 rows)
Previously, patch rejected applying SJE for result relation, which as
I see now is wrong. Andrei's patch rejects SJE for target relation on
the base of row marks, which seems correct to me as the first glance.
So, this doesn't change anything regarding my conclusions regarding
applying SJE for target relation. But the Andrei's patch yet looks
good indeed.
------
Regards,
Alexander Korotkov
Supabase
hi.
Here is the latest patch (v6),
I've made the following changes.
* disallow original Query->resultRelation participate in SJE.
for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
we can do:
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
s2 on s1.a = s2.a) as sz
WHERE sz.a = sq.a;
here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
apply to SJE.
but for now we cannot apply SJE to
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;
so the EPQ abnormality issue[1]/messages/by-id/flat/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ@mail.gmail.com won't happen.
* add a new function: ChangeVarNodesExtended for
address concerns in [2]/messages/by-id/flat/CAPpHfdvBddujLDhf7TQP-djeKoG5oyFBEoLSGRsjHfGrcNFkDg@mail.gmail.com
* cosmetic refactor remove_self_join_rel, match_unique_clauses,
split_selfjoin_quals functions.
changing some "foreach" to "foreach_node"
* refactor comments above (remove_self_joins_one_group ->> match_unique_clauses)
I am not 100% sure they are correct.
[1]: /messages/by-id/flat/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ@mail.gmail.com
[2]: /messages/by-id/flat/CAPpHfdvBddujLDhf7TQP-djeKoG5oyFBEoLSGRsjHfGrcNFkDg@mail.gmail.com
Attachments:
v6-0001-Remove-useless-self-joins.patchapplication/x-patch; name=v6-0001-Remove-useless-self-joins.patchDownload
From 2e8a79a2203eeaeb004af69f23f32025e2e29ead Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 15 Jul 2024 11:30:17 +0800
Subject: [PATCH v6 1/1] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1126 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 150 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 +++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 +++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2924 insertions(+), 84 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b14c5d81..27ec59c6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5612,6 +5612,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d80632..b1863726 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -856,7 +856,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, exprvars)
{
if (!is_exprlist_member(lfirst(lc2), exprs))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d7..2230b131 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81..89b194a3 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -29,12 +30,31 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +340,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +382,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -434,10 +449,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +462,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +911,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +931,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1249,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1290,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1328,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1342,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1397,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1428,885 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5..075d36c7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e9..c5a8d00b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0..da69ac5a 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -540,6 +540,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (oldId < 0)
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (newId < 0)
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -548,6 +575,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -558,6 +586,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -590,7 +619,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -637,6 +666,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -670,6 +768,55 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
(void *) context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
+void
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index, int sublevels_up, bool change_RangeTblRef)
+{
+ ChangeVarNodes_context context;
+ context.rt_index = rt_index;
+ context.new_index = new_index;
+ context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
+
+ if (node && IsA(node, Query))
+ {
+ Query *qry = (Query *) node;
+ if (sublevels_up == 0)
+ {
+ ListCell *l;
+
+ if (qry->resultRelation == rt_index)
+ qry->resultRelation = new_index;
+
+ if (qry->mergeTargetRelation == rt_index)
+ qry->mergeTargetRelation = new_index;
+
+ if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
+ qry->onConflict->exclRelIndex = new_index;
+
+ foreach(l, qry->rowMarks)
+ {
+ RowMarkClause *rc = (RowMarkClause *) lfirst(l);
+
+ if (rc->rti == rt_index)
+ rc->rti = new_index;
+ }
+ }
+ query_tree_walker(qry, ChangeVarNodes_walker,
+ (void *) &context, 0);
+ }
+ else
+ ChangeVarNodes_walker(node, &context);
+}
void
ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
{
@@ -678,6 +825,7 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = true;
/*
* Must be prepared to start with a Query or a bare expression tree; if
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 630ed0f1..40db9d25 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1a..56b34ae3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 7b63c5cf..4381124f 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c022..fff4b69d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc1737..f2e3fa4c 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049..074847f1 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047..3d5de283 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 53f70d72..4e4cec63 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6213,6 +6213,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 729620de..9da8c501 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31..77dd964e 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d81ff63b..3e94e0af 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2345,6 +2345,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 635e6d6e..74091ba4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -390,6 +390,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2565,6 +2566,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3997,6 +3999,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.34.1
On 7/15/24 12:31, jian he wrote:
hi.
Here is the latest patch (v6),
I've made the following changes.* disallow original Query->resultRelation participate in SJE.
for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
we can do:
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
s2 on s1.a = s2.a) as sz
WHERE sz.a = sq.a;here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
apply to SJE.but for now we cannot apply SJE to
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;so the EPQ abnormality issue[1] won't happen.
* add a new function: ChangeVarNodesExtended for
address concerns in [2]
I see you still stay with the code line:
if (omark && imark && omark->markType != imark->markType)
It is definitely an error. What if omark is NULL, but imark is not? Why
not to skip this pair of relids? Or, at least, insert an assertion to
check that you filtered it earlier.
--
regards, Andrei Lepikhov
On Mon, Jul 15, 2024 at 2:08 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/15/24 12:31, jian he wrote:
hi.
Here is the latest patch (v6),
I've made the following changes.* disallow original Query->resultRelation participate in SJE.
for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
we can do:
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
s2 on s1.a = s2.a) as sz
WHERE sz.a = sq.a;here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
apply to SJE.but for now we cannot apply SJE to
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;so the EPQ abnormality issue[1] won't happen.
* add a new function: ChangeVarNodesExtended for
address concerns in [2]I see you still stay with the code line:
if (omark && imark && omark->markType != imark->markType)It is definitely an error. What if omark is NULL, but imark is not? Why
not to skip this pair of relids? Or, at least, insert an assertion to
check that you filtered it earlier.
i think "omark is NULL, but imark is not" case won't reach to
remove_self_joins_one_group.
In that case, omark associated RangeTblEntry->rtekind will be RTE_SUBQUERY,
and will be skipped earlier in remove_self_joins_recurse.
Still, do you think the following code is the right way to go?
if ((omark == NULL && imark != NULL) ||
(omark != NULL && imark == NULL) ||
(omark && imark && omark->markType != imark->markType))
continue;
On 7/15/24 14:35, jian he wrote:
On Mon, Jul 15, 2024 at 2:08 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/15/24 12:31, jian he wrote:
hi.
Here is the latest patch (v6),
I've made the following changes.* disallow original Query->resultRelation participate in SJE.
for SELECT, nothing is changed. for UPDATE/DELETE/MERGE
we can do:
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM (select s1.* from sj s1 join sj
s2 on s1.a = s2.a) as sz
WHERE sz.a = sq.a;here, only "(select s1.* from sj s1 join sj s2 on s1.a = s2.a)" can
apply to SJE.but for now we cannot apply SJE to
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = sq.b + sz.a FROM sj as sz WHERE sz.a = sq.a;so the EPQ abnormality issue[1] won't happen.
* add a new function: ChangeVarNodesExtended for
address concerns in [2]I see you still stay with the code line:
if (omark && imark && omark->markType != imark->markType)It is definitely an error. What if omark is NULL, but imark is not? Why
not to skip this pair of relids? Or, at least, insert an assertion to
check that you filtered it earlier.i think "omark is NULL, but imark is not" case won't reach to
remove_self_joins_one_group.
In that case, omark associated RangeTblEntry->rtekind will be RTE_SUBQUERY,
and will be skipped earlier in remove_self_joins_recurse.Still, do you think the following code is the right way to go?
if ((omark == NULL && imark != NULL) ||
(omark != NULL && imark == NULL) ||
(omark && imark && omark->markType != imark->markType))
continue;
Sure, if query block needs RowMark it applies proper RowMark to each
base relation. All pull-up transformations executes before this code.
But it is worth to set Assert at the point to check that nothing changed
in the code above and the patch works correctly, am I wrong?
--
regards, Andrei Lepikhov
<div><div><div>Hi!</div><div> </div><div>I did the SJE testing at Andrey's request.</div><div>To do this, I used the automatic testing tool EET (Equivalent Expression Transformation) [1] with some modifications. </div><div>EET transforms the logical conditions in a query, creating multiple queries waiting for the same number of rows.</div><div> </div><div>In order to make sure that the SJE logic is executed at all, I tried to cover the code with ereports() as much as possible.</div><div>During the testing process, I did not find any inconsistencies in the number of rows returned, as well as other critical problems.</div></div><div> </div><div> </div><div><div>[1] <a href="https://github.com/JZuming/EET" rel="noopener noreferrer" target="_blank">https://github.com/JZuming/EET</a></div><div> </div></div></div><div>-- </div><div>Regards,</div><div>Vardan Pogosyan</div><div> </div>
Hi, Vardan!
Great, thank you!
On Tue, Jul 16, 2024 at 5:26 PM Вардан Погосян <vardan.pogosyn@yandex.ru> wrote:
I did the SJE testing at Andrey's request.
To do this, I used the automatic testing tool EET (Equivalent Expression Transformation) [1] with some modifications.
EET transforms the logical conditions in a query, creating multiple queries waiting for the same number of rows.
What revision of patch did you use?
In order to make sure that the SJE logic is executed at all, I tried to cover the code with ereports() as much as possible.
Could you share this? Probably some of these ereports() we would like to keep.
During the testing process, I did not find any inconsistencies in the number of rows returned, as well as other critical problems.
Did you use assert-enabled build? I guess you mean no server crashes,
right? Also, could you share some statistics on how long did you run,
what number of queries did you execute etc.?
------
Regards,
Alexander Korotkov
Supabase
Hi, Tom!
I'd like to give you and update on the progress with SJE.
On Mon, May 6, 2024 at 6:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner.FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier. I had other things on my mind
for most of last year, and not enough bandwidth to help.The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure. (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)
Jian He gave a try to ChangeVarNodes() [1]. That gives some
improvement, but the vast majority of complexity is still here. I
think the reason for complexity of SJE is that it's the first time we
remove relation, which is actually *used* and therefore might has
references in awful a lot of places. In previous cases we removed
relations, which were actually unused.
There are actually alternative designs for this feature. I've
proposed "alias relids" before [2]. But it's not clear we will
resolve more problems than create, given that it could break awfully a
lot of assumptions during query planning. Andrei also proposed that
perl script could generate us a walker over planner structures [3].
Although this method might offer a structured approach, it seems like
overengineering for the problem at hand.
I believe it's worth giving the current approach another chance.
Vardan Pogosyan has conducted some tests, and I am in the process of
clarifying the details. We could enhance the approach by adding more
comments to ensure that any changes in the planner data structure are
flagged for potential revisions in the SJE code. What do you think?
Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation). While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability. So I think we need to
look harder at finding a way around that. Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned? If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward. Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.
As Andrei pointed it's possible to apply SJE to result relation [4],
but where it's not a target relation. I guess the target relation
case is what you're most interested. In this case we hit problem of
joining relation having different row marks. In turn that triggers
EPQ problem [5] and probably more. In order to resolve that we need a
way to store multiple (at least two, but sure if more is needed)
tuples for relation. I still feel that we should postpone that,
because even basic SJE without target relation support is challenging.
There is probably a way to implement target relation support for PG18
after committing basic SJE. But that would require a lot of your
design work and guidance. I don't dare to design this kind of things.
Links.
1. /messages/by-id/CACJufxHBLhOD1LerM643dgh=UZFGhPWfP1027D2x1W6DhF_BaQ@mail.gmail.com
2. /messages/by-id/CAPpHfdv6B8HCLdj8WidBryRrX0+X3F1rrR8uAuMQmp6rvPdscg@mail.gmail.com
3. /messages/by-id/96250a42-20e3-40f0-9d45-f53ae852f8ed@gmail.com
4. /messages/by-id/5b49501c-9cb3-4c5d-9d56-49704ff08143@gmail.com
5. /messages/by-id/CAPpHfduM6X82ExT0r9UzFLJ12wOYPvRw5vT2Htq0gAPBgHhKeQ@mail.gmail.com
------
Regards,
Alexander Korotkov
Supabase
On 16.07.2024 21:30, Alexander Korotkov wrote:
Hi, Vardan!
Great, thank you!
On Tue, Jul 16, 2024 at 5:26 PM Вардан Погосян<vardan.pogosyn@yandex.ru> wrote:
I did the SJE testing at Andrey's request.
To do this, I used the automatic testing tool EET (Equivalent Expression Transformation) [1] with some modifications.
EET transforms the logical conditions in a query, creating multiple queries waiting for the same number of rows.What revision of patch did you use?
I used Andrey's v4 patches.
In order to make sure that the SJE logic is executed at all, I tried to cover the code with ereports() as much as possible.
Could you share this? Probably some of these ereports() we would like to keep.
I'm not sure if it can be saved. Many reports are some messy and no
information that is really useful for the user.
During the testing process, I did not find any inconsistencies in the number of rows returned, as well as other critical problems.
Did you use assert-enabled build? I guess you mean no server crashes,
right? Also, could you share some statistics on how long did you run,
what number of queries did you execute etc.?------
Regards,
Alexander Korotkov
Supabase
Of course, i used assertion-enabled build and server didn't fail.
EET ran 10 iterations with 100 random generated queries + 100
transformed queries => 2000 queries in total.
I can't calculate exactly how many clauses EET generates after
transformations, but according to rough estimates for all test time,
about 8-9% clauses detected as self join during the entire testing period.
The last query saved by EET contains 4 clauses, and after
transformations, the new query contains ~132 clauses. SJE was not used
in the original query, but it was used 5 times after the conversion.
On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Jian He gave a try to ChangeVarNodes() [1]. That gives some
improvement, but the vast majority of complexity is still here. I
think the reason for complexity of SJE is that it's the first time we
remove relation, which is actually *used* and therefore might has
references in awful a lot of places. In previous cases we removed
relations, which were actually unused.
I had a quick look at this, and I have a couple of comments on the
rewriter changes.
The new function replace_relid() looks to be the same as adjust_relid_set().
The changes to ChangeVarNodes() look a little messy. There's a lot of
code duplicated between ChangeVarNodesExtended() and ChangeVarNodes(),
which could be avoided by having one call the other. Also, it would be
better for ChangeVarNodesExtended() to have a "flags" parameter
instead of an extra boolean parameter, to make it more extensible in
the future. However,...
I question whether ChangeVarNodesExtended() and the changes to
ChangeVarNodes() are really the right way to go about this.
ChangeVarNodes() in particular gains a lot more logic to handle
RestrictInfo nodes that doesn't really feel like it belongs there --
e.g., building NullTest nodes is really specific to SJE, and doesn't
seem like it's something ChangeVarNodes() should be doing.
A better solution might be to add a new walker function to
analyzejoins.c that does just what SJE needs, which is different from
ChangeVarNodes() in a number of ways. For Var nodes, it might
ultimately be necessary to do more than just change the varno, to
solve the RETURNING/EPQ problems. For RestrictInfo nodes, there's a
lot of SJE-specific logic. The SJE code wants to ignore RangeTblRef
nodes, but it could delegate to ChangeVarNodes() for various other
node types to avoid code duplication. At the top level, the stuff that
ChangeVarNodes() does to fields of the Query struct would be different
for SJE, I think.
Regards,
Dean
On Fri, Jul 19, 2024 at 11:30 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Jian He gave a try to ChangeVarNodes() [1]. That gives some
improvement, but the vast majority of complexity is still here. I
think the reason for complexity of SJE is that it's the first time we
remove relation, which is actually *used* and therefore might has
references in awful a lot of places. In previous cases we removed
relations, which were actually unused.I had a quick look at this, and I have a couple of comments on the
rewriter changes.The new function replace_relid() looks to be the same as adjust_relid_set().
They are similar, not the same. replace_relid() has handling for
negative newId, while adjust_relid_set() hasn't. One thing I'd like
to borrow from adjust_relid_set() to replace_relid() is the usage of
IS_SPECIAL_VARNO() macro.
It would be probably nice to move this logic into bms_replace_member()
residing at bitmapset.c. What do you think?
The changes to ChangeVarNodes() look a little messy. There's a lot of
code duplicated between ChangeVarNodesExtended() and ChangeVarNodes(),
which could be avoided by having one call the other. Also, it would be
better for ChangeVarNodesExtended() to have a "flags" parameter
instead of an extra boolean parameter, to make it more extensible in
the future. However,...
I certainly didn't mean to have duplicate functions
ChangeVarNodesExtended() and ChangeVarNodes(). I mean
ChangeVarNodes() should just call ChangeVarNodesExtended().
I question whether ChangeVarNodesExtended() and the changes to
ChangeVarNodes() are really the right way to go about this.
ChangeVarNodes() in particular gains a lot more logic to handle
RestrictInfo nodes that doesn't really feel like it belongs there --
e.g., building NullTest nodes is really specific to SJE, and doesn't
seem like it's something ChangeVarNodes() should be doing.A better solution might be to add a new walker function to
analyzejoins.c that does just what SJE needs, which is different from
ChangeVarNodes() in a number of ways. For Var nodes, it might
ultimately be necessary to do more than just change the varno, to
solve the RETURNING/EPQ problems. For RestrictInfo nodes, there's a
lot of SJE-specific logic. The SJE code wants to ignore RangeTblRef
nodes, but it could delegate to ChangeVarNodes() for various other
node types to avoid code duplication. At the top level, the stuff that
ChangeVarNodes() does to fields of the Query struct would be different
for SJE, I think.
We initially didn't use ChangeVarNodes() in SJE at all. See the last
patch version without it [1]. We're trying to address Tom Lane's
proposal to re-use more of existing tree-manipulation infrastructure
[2]: . I agree with you that the case with ChangeVarNodes() looks questionable. Do you have other ideas how we can re-use some more of existing tree-manipulation infrastructure in SJE?
questionable. Do you have other ideas how we can re-use some more of
existing tree-manipulation infrastructure in SJE?
Links
1. /messages/by-id/55f680bc-756d-4dd3-ab27-3c6e663b0e4c@postgrespro.ru
2. /messages/by-id/3622801.1715010885@sss.pgh.pa.us
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v7-0001-Remove-useless-self-joins.patchapplication/octet-stream; name=v7-0001-Remove-useless-self-joins.patchDownload
From e2d0de1f57bce7c25785c18f1347efe31bcf9062 Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Mon, 15 Jul 2024 11:30:17 +0800
Subject: [PATCH v7] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1126 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 137 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1064 +++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 486 +++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2893 insertions(+), 102 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3dec0b7cfeb..9b5342acc6f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5618,6 +5618,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d806326eb..b1863726816 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -856,7 +856,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, exprvars)
{
if (!is_exprlist_member(lfirst(lc2), exprs))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78df..2230b131047 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index c3fd4a81f8a..89b194a3730 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -29,12 +30,31 @@
#include "optimizer/paths.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -42,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -87,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -291,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -306,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -318,31 +340,27 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
- * Remove references to the rel from other baserels' attr_needed arrays.
+ * Remove references to the rel from other baserels' attr_needed arrays
+ * and lateral_vars lists.
*/
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
@@ -364,19 +382,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
attroff--)
{
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], relid);
+ replace_relid(otherrel->attr_needed[attroff], relid, subst);
otherrel->attr_needed[attroff] =
- bms_del_member(otherrel->attr_needed[attroff], ojrelid);
+ replace_relid(otherrel->attr_needed[attroff], ojrelid, subst);
}
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
}
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -390,30 +412,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -434,10 +449,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -447,18 +462,58 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, relid);
- phinfo->ph_needed = bms_del_member(phinfo->ph_needed, ojrelid);
+
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, relid, subst);
+ phinfo->ph_needed = replace_relid(phinfo->ph_needed, ojrelid, subst);
/* ph_needed might or might not become empty */
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
/*
* Remove any joinquals referencing the rel from the joininfo lists.
@@ -856,9 +911,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -871,10 +931,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1188,9 +1249,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1205,17 +1290,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1232,7 +1328,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1245,10 +1342,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1294,7 +1397,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1324,17 +1428,885 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3e..075d36c7ecc 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1c69c6e97e8..c5a8d00bb3e 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 191f2dc0b1d..1d641f6afaf 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -540,6 +540,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (IS_SPECIAL_VARNO(oldId))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newId))
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -548,6 +575,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -558,6 +586,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -590,7 +619,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -637,6 +666,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -670,32 +768,29 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
(void *) context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
- ChangeVarNodes_context context;
-
+ ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
-
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -706,7 +801,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -724,6 +818,11 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
else
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
/*
* Substitute newrelid for oldrelid in a Relid set
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 630ed0f1629..40db9d25b4b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1c..56b34ae3e36 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -728,7 +728,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -967,7 +967,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3432,4 +3432,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 7b63c5cf718..4381124f6e8 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c0224a4..fff4b69dfd9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc1737921..f2e3fa4c2ec 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -108,6 +109,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049e80..074847f1a4c 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 126f7047fed..3d5de283544 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 53f70d72ed6..4e4cec633af 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6213,6 +6213,1070 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 729620de13c..9da8c5013cf 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,10 +153,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 247b0a31055..77dd964ebf2 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index d81ff63be53..3e94e0af538 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2345,6 +2345,492 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4d7f9217ce..00f0154cb50 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -390,6 +390,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2566,6 +2567,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -3998,6 +4000,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.3 (Apple Git-145)
On 20/7/2024 18:38, Alexander Korotkov wrote:
On Fri, Jul 19, 2024 at 11:30 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov@gmail.com> wrote:
We initially didn't use ChangeVarNodes() in SJE at all. See the last
patch version without it [1]. We're trying to address Tom Lane's
proposal to re-use more of existing tree-manipulation infrastructure
[2]. I agree with you that the case with ChangeVarNodes() looks
questionable. Do you have other ideas how we can re-use some more of
existing tree-manipulation infrastructure in SJE?
As I can see, ChangeVarNodes is dedicated to working with the query tree
before the planning phase. SJE works right in the middle of the planning
process. So, it may be more practical to keep it separate as a walker,
as Dean has proposed. If the optimisation stuff changes, the walker code
will be changed, too.
--
regards, Andrei Lepikhov
On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov <aekorotkov@gmail.com> wrote:
The new function replace_relid() looks to be the same as adjust_relid_set().
They are similar, not the same. replace_relid() has handling for
negative newId, while adjust_relid_set() hasn't. One thing I'd like
to borrow from adjust_relid_set() to replace_relid() is the usage of
IS_SPECIAL_VARNO() macro.
Ah, that makes sense. In that case, I'd say that replace_relid()
should go in analyzejoins.c (and be a local function there), since
that's the only place that requires this special negative newId
handling.
It would be probably nice to move this logic into bms_replace_member()
residing at bitmapset.c. What do you think?
Maybe. It feels a little specialised though, so maybe it's not worth the effort.
I have been reviewing more of the patch, mainly focusing on the logic
in analyzejoins.c that decides when to apply SJE.
I understand broadly what the code is doing, but I still find it
somewhat hard to follow. One thing that makes it hard is that in
analyzejoins.c, "inner" and "outer" get swapped round at various
points. For example generate_join_implied_equalities() is defined like
this:
List *
generate_join_implied_equalities(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel,
SpecialJoinInfo *sjinfo);
but remove_self_joins_one_group() calls it like this:
restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);
So you have to remember that "inner" is "outer" and "outer" is "inner"
when going into generate_join_implied_equalities() from
remove_self_joins_one_group(). And the same thing happens when calling
innerrel_is_unique_ext() and match_unique_clauses(). I think all that
could be resolved by swapping "inner" and "outer" in the variable
names and comments in remove_self_joins_one_group().
Another thing I noticed in remove_self_joins_one_group() was this:
/*
* To enable SJE for the only degenerate case without any self
* join clauses at all, add baserestrictinfo to this list. The
* degenerate case works only if both sides have the same clause.
* So doesn't matter which side to add.
*/
selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
That appears to be pointless, because is_innerrel_unique_for() will
filter the restrictlist it is given, removing those baserestrictinfo
clauses (because I think they'll always have can_join = false). And
then relation_has_unique_index_ext() will re-add them:
/*
* Examine the rel's restriction clauses for usable var = const clauses
* that we can add to the restrictlist.
*/
foreach(ic, rel->baserestrictinfo)
{
... add suitable clauses
}
where "rel" is "innerrel" from is_innerrel_unique_for(), which is
"outer" from remove_self_joins_one_group(), so it's the same set of
baserestrictinfo clauses.
Something else that looks a little messy is this in innerrel_is_unique_ext():
/*
* innerrel_is_unique_ext
* Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
* additional clauses from a baserestrictinfo list that were used to prove
* uniqueness. A non NULL 'extra_clauses' indicates that we're checking
* for self-join and correspondingly dealing with filtered clauses.
*/
bool
innerrel_is_unique_ext(PlannerInfo *root,
...
List **extra_clauses)
{
bool self_join = (extra_clauses != NULL);
[logic depending on self_join]
}
This presumes that any caller interested in knowing the extra
baserestrictinfo clauses used to prove uniqueness must be looking at a
self join. That may be true today, but it doesn't seem like a good API
design choice. I think it would be better to just add "self_join" as
an extra parameter, and also maybe have the function return the
UniqueRelInfo containing the "extra_clauses", or NULL if it's not
unique. That way, it would be more extensible, if we wanted it to
return more information in the future.
Instead of adding relation_has_unique_index_ext(), maybe it would be
OK to just change the signature of relation_has_unique_index_for(). It
looks like it's only called from one other place outside
analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().
Should match_unique_clauses() be comparing mergeopfamilies or opnos to
ensure that the clauses are using the same equality operator?
Regards,
Dean
On 7/24/24 18:07, Dean Rasheed wrote:
On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov <aekorotkov@gmail.com> wrote:
The new function replace_relid() looks to be the same as adjust_relid_set().
They are similar, not the same. replace_relid() has handling for
negative newId, while adjust_relid_set() hasn't. One thing I'd like
to borrow from adjust_relid_set() to replace_relid() is the usage of
IS_SPECIAL_VARNO() macro.Ah, that makes sense. In that case, I'd say that replace_relid()
should go in analyzejoins.c (and be a local function there), since
that's the only place that requires this special negative newId
handling.
Done. See attachment.
So you have to remember that "inner" is "outer" and "outer" is "inner"
when going into generate_join_implied_equalities() from
remove_self_joins_one_group(). And the same thing happens when calling
innerrel_is_unique_ext() and match_unique_clauses(). I think all that
could be resolved by swapping "inner" and "outer" in the variable
names and comments in remove_self_joins_one_group().
As you can see, when you dive into the implementation of the
generate_join_implied_equalities, an inner join works symmetrically. We
don't need any other clauses here, except mentioning both inner and
outer to prove self-join. So, it doesn't matter which side to choose as
an inner - all the effect can be: instead of x1=x2, we will have clause
x2=x1.
selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
That appears to be pointless, because is_innerrel_unique_for() will
filter the restrictlist it is given, removing those baserestrictinfo
clauses (because I think they'll always have can_join = false). And
then relation_has_unique_index_ext() will re-add them:
Yeah, we used to have it to work out the case, then restrictlist == NIL.
I agree, it may be a bit strange and too expensive. Let's add a flag
check inside the innerrel_is_unique_ext instead.
This presumes that any caller interested in knowing the extra
baserestrictinfo clauses used to prove uniqueness must be looking at a
self join. That may be true today, but it doesn't seem like a good API
design choice. I think it would be better to just add "self_join" as
an extra parameter, and also maybe have the function return the
UniqueRelInfo containing the "extra_clauses", or NULL if it's not
unique. That way, it would be more extensible, if we wanted it to
return more information in the future.
If more users are interested in this function, we can just rename the
flag self_join to something more universal - extra_prove, I'm not sure.
Instead of adding relation_has_unique_index_ext(), maybe it would be
OK to just change the signature of relation_has_unique_index_for(). It
looks like it's only called from one other place outside
analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().
I vote against it remembering of extensions and broken APIs.
Should match_unique_clauses() be comparing mergeopfamilies or opnos to
ensure that the clauses are using the same equality operator?
Having the same variable side, same constant, unique index? I can
imagine only a self-made algebra, which can break something here, but it
looks like over-engineering, right?
Attached version is rebased onto the new master. Because of a3179ab it
may be necessary to do more detailed review.
--
regards, Andrei Lepikhov
Attachments:
v8-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v8-0001-Remove-useless-self-joins.patchDownload
From 358aafafc9f61b52c39d2a754e7b323ada00b78b Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 7 Oct 2024 10:43:04 +0700
Subject: [PATCH v8] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self-join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, the inner restrictlist moves to the outer one, and duplicated
clauses are removed. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces the
number of restriction clauses === selectivity estimations, and potentially can
improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baserestrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must match
exactly.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1157 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 110 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 3 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1079 +++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 490 +++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2918 insertions(+), 99 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9707d5238d..cf0dee60f0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5622,6 +5622,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 8f6f005ecb..ce5d8fc3ac 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -860,7 +860,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, exprvars)
{
if (!is_exprlist_member(lfirst(lc2), exprs))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78d..2230b13104 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 928d926645..c4343c3dc2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,12 +31,30 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -43,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -292,8 +313,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -307,7 +328,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -319,36 +340,69 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Substitute newId by oldId in relids.
+ * Works almost like the adjust_relid_set, but process negative newrelid case.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldrelid, int newrelid)
+{
+ if (IS_SPECIAL_VARNO(oldrelid))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newrelid))
+ {
+ relids = bms_copy(relids);
+ return bms_del_member(relids, oldrelid);
+ }
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldrelid, relids))
+ {
+ /* Ensure we have a modifiable copy */
+ relids = bms_copy(relids);
+
+ /* Remove old, add new */
+ relids = bms_del_member(relids, oldrelid);
+ return bms_add_member(relids, newrelid);
+ }
+
+ return relids;
+}
+
+/*
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -362,30 +416,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -406,10 +453,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -419,21 +466,73 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+
+ if (otherrel == NULL)
+ continue;
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ Index rti;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -875,9 +974,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -890,10 +994,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1207,12 +1312,40 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
- /* Certainly can't prove uniqueness when there are no joinclauses */
- if (restrictlist == NIL)
+ /*
+ * Don't waste cycles attempting to prove uniqueness when there are no join
+ * clauses. Still, try to do it in an attempt to remove a join because a
+ * degenerate case can take place and may be profitable for the performance.
+ */
+ if (restrictlist == NIL && !self_join)
return false;
/*
@@ -1224,17 +1357,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1251,7 +1395,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1264,10 +1409,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1313,7 +1464,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1343,17 +1495,874 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ false, &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..075d36c7ec 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..6a519ccdec 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index b20625fbd2..13d6cfddca 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -548,6 +548,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -558,6 +559,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -590,7 +592,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -637,6 +639,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -670,32 +741,29 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
(void *) context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
- ChangeVarNodes_context context;
-
+ ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
-
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -706,7 +774,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -724,6 +791,11 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
else
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
/*
* Substitute newrelid for oldrelid in a Relid set
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..24c69674ea 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..f1e5df18c2 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -734,7 +734,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -973,7 +973,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3440,4 +3440,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 93e3dc719d..550e001dc0 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d4401..64ba89fcb8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 93137261e4..bcd67f3791 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -112,6 +113,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049e8..710ec7b1b7 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -46,6 +47,8 @@ extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index a328164fe0..f328e92b08 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 12abd3a0e7..030470fa4e 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6223,6 +6223,1085 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is derived from
+-- baserestrictinfo clauses.
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Two approaches to write a join. In both cases it is sel-join which addresses
+-- the same physical tuple and can be removed from the join search space.
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1, sj j2 WHERE j1.a = 3 AND j2.a = 3;
+ QUERY PLAN
+-------------------
+ Seq Scan on sj j2
+ Filter: (a = 3)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1 JOIN sj j2 ON (j1.a = 3 AND j2.a = 3);
+ QUERY PLAN
+-------------------
+ Seq Scan on sj j2
+ Filter: (a = 3)
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..7fcc07c20c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -167,10 +167,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d0..7fc2159349 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 0c65e5af4b..4e429750af 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2350,6 +2350,496 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is derived from
+-- baserestrictinfo clauses.
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Two approaches to write a join. In both cases it is sel-join which addresses
+-- the same physical tuple and can be removed from the join search space.
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1, sj j2 WHERE j1.a = 3 AND j2.a = 3;
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1 JOIN sj j2 ON (j1.a = 3 AND j2.a = 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c4de597b1f..036e2d717b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -391,6 +391,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2573,6 +2574,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4016,6 +4018,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5
On 10/8/24 10:54, Andrei Lepikhov wrote:
On 7/24/24 18:07, Dean Rasheed wrote:
On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov
<aekorotkov@gmail.com> wrote:Attached version is rebased onto the new master. Because of a3179ab it
may be necessary to do more detailed review.
Just a rebase on the current master.
--
regards, Andrei Lepikhov
Attachments:
v9-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v9-0001-Remove-useless-self-joins.patchDownload
From ebaefc61c3eaa92ee554fc27d505d79475cf4e48 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 7 Oct 2024 10:43:04 +0700
Subject: [PATCH v9] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self-join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, the inner restrictlist moves to the outer one, and duplicated
clauses are removed. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces the
number of restriction clauses === selectivity estimations, and potentially can
improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baserestrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must match
exactly.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1157 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 110 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 3 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1079 +++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 490 +++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2918 insertions(+), 99 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index dc401087dc..777a479e0c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5649,6 +5649,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index fae137dd82..13d751c7b4 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -880,7 +880,8 @@ find_computable_ec_member(PlannerInfo *root,
emvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, emvars)
{
if (!list_member(exprvars, lfirst(lc2)))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78d..2230b13104 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3440,6 +3440,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3511,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3563,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3623,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5bc16c4bfc..03cba3cb40 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,12 +31,30 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -43,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -261,8 +282,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,69 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Substitute newId by oldId in relids.
+ * Works almost like the adjust_relid_set, but process negative newrelid case.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldrelid, int newrelid)
+{
+ if (IS_SPECIAL_VARNO(oldrelid))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newrelid))
+ {
+ relids = bms_copy(relids);
+ return bms_del_member(relids, oldrelid);
+ }
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldrelid, relids))
+ {
+ /* Ensure we have a modifiable copy */
+ relids = bms_copy(relids);
+
+ /* Remove old, add new */
+ relids = bms_del_member(relids, oldrelid);
+ return bms_add_member(relids, newrelid);
+ }
+
+ return relids;
+}
+
+/*
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -331,30 +385,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -375,10 +422,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +435,73 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+
+ if (otherrel == NULL)
+ continue;
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ Index rti;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -844,9 +943,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +963,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,12 +1281,40 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
- /* Certainly can't prove uniqueness when there are no joinclauses */
- if (restrictlist == NIL)
+ /*
+ * Don't waste cycles attempting to prove uniqueness when there are no join
+ * clauses. Still, try to do it in an attempt to remove a join because a
+ * degenerate case can take place and may be profitable for the performance.
+ */
+ if (restrictlist == NIL && !self_join)
return false;
/*
@@ -1193,17 +1326,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1364,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1378,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1433,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1464,874 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /* Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ false, &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..075d36c7ec 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a1..6a519ccdec 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -670,6 +670,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 8f90afb326..5058a700ee 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -548,6 +548,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -558,6 +559,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -590,7 +592,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -637,6 +639,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -670,32 +741,29 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
(void *) context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
- ChangeVarNodes_context context;
-
+ ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
-
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -706,7 +774,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -724,6 +791,11 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
else
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
/*
* Substitute newrelid for oldrelid in a Relid set
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8a67f01200..99212aeec8 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45f..1e9d3a0174 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -734,7 +734,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -973,7 +973,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3443,4 +3443,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 93e3dc719d..550e001dc0 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d4401..64ba89fcb8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 93137261e4..bcd67f3791 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -112,6 +113,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049e8..710ec7b1b7 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -46,6 +47,8 @@ extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index a328164fe0..f328e92b08 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b2973694f..f9961f0824 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6293,6 +6293,1085 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is derived from
+-- baserestrictinfo clauses.
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Two approaches to write a join. In both cases it is sel-join which addresses
+-- the same physical tuple and can be removed from the join search space.
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1, sj j2 WHERE j1.a = 3 AND j2.a = 3;
+ QUERY PLAN
+-------------------
+ Seq Scan on sj j2
+ Filter: (a = 3)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1 JOIN sj j2 ON (j1.a = 3 AND j2.a = 3);
+ QUERY PLAN
+-------------------
+ Seq Scan on sj j2
+ Filter: (a = 3)
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..7fcc07c20c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -167,10 +167,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d0..7fc2159349 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4c9c3e9f49..689c762b4d 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2379,6 +2379,496 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is derived from
+-- baserestrictinfo clauses.
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Two approaches to write a join. In both cases it is sel-join which addresses
+-- the same physical tuple and can be removed from the join search space.
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1, sj j2 WHERE j1.a = 3 AND j2.a = 3;
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1 JOIN sj j2 ON (j1.a = 3 AND j2.a = 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..79f4cba128 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -391,6 +391,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2575,6 +2576,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4023,6 +4025,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5
On Tue, Oct 8, 2024 at 6:54 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 7/24/24 18:07, Dean Rasheed wrote:
So you have to remember that "inner" is "outer" and "outer" is "inner"
when going into generate_join_implied_equalities() from
remove_self_joins_one_group(). And the same thing happens when calling
innerrel_is_unique_ext() and match_unique_clauses(). I think all that
could be resolved by swapping "inner" and "outer" in the variable
names and comments in remove_self_joins_one_group().As you can see, when you dive into the implementation of the
generate_join_implied_equalities, an inner join works symmetrically. We
don't need any other clauses here, except mentioning both inner and
outer to prove self-join. So, it doesn't matter which side to choose as
an inner - all the effect can be: instead of x1=x2, we will have clause
x2=x1.
I think switching places of inner and outer still makes sense in terms
of readability. I've done so for v10. Also, added a comment
regarding symmetry of arguments for generate_join_implied_equalities()
call.
This presumes that any caller interested in knowing the extra
baserestrictinfo clauses used to prove uniqueness must be looking at a
self join. That may be true today, but it doesn't seem like a good API
design choice. I think it would be better to just add "self_join" as
an extra parameter, and also maybe have the function return the
UniqueRelInfo containing the "extra_clauses", or NULL if it's not
unique. That way, it would be more extensible, if we wanted it to
return more information in the future.If more users are interested in this function, we can just rename the
flag self_join to something more universal - extra_prove, I'm not sure.Instead of adding relation_has_unique_index_ext(), maybe it would be
OK to just change the signature of relation_has_unique_index_for(). It
looks like it's only called from one other place outside
analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().I vote against it remembering of extensions and broken APIs.
I spend some time using github search to find usage of
innerrel_is_unique() and relation_has_unique_index() in extensions.
Didn't find extensions actually use them. Some forks are using them
in the way PostgreSQL doesn't. But for forks it's way easier to adopt
such trivial API changes. If even there are some extensions using
this functions, it wouldn't be a terrible case of the broken API. The
current way seems to be rather bloating of our API. This is why I
decided to remove innerrel_is_unique_ext() and
relation_has_unique_index_ext(), and just add arguments to
innerrel_is_unique() and relation_has_unique_index_for().
I also did some rephrasing and grammar fixes for comments.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v10-0001-Remove-useless-self-joins.patchapplication/octet-stream; name=v10-0001-Remove-useless-self-joins.patchDownload
From dd13822ea73c5c387cd12053aa1d18be59603a90 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Mon, 7 Oct 2024 10:43:04 +0700
Subject: [PATCH v10] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self-join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, the inner restrictlist moves to the outer one, and duplicated
clauses are removed. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces the
number of restriction clauses === selectivity estimations, and potentially can
improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baserestrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must match
exactly.
The relation replacement procedure is not trivial, and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 28 +-
src/backend/optimizer/path/joinpath.c | 6 +-
src/backend/optimizer/plan/analyzejoins.c | 1153 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 9 +-
src/backend/optimizer/util/pathnode.c | 3 +-
src/backend/rewrite/rewriteManip.c | 106 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 35 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +-
src/include/optimizer/planmain.h | 5 +-
src/include/rewrite/rewriteManip.h | 3 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1079 +++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 490 +++++++++
src/tools/pgindent/typedefs.list | 2 +
21 files changed, 2902 insertions(+), 105 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index dc401087dc6..777a479e0ca 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5649,6 +5649,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
+ <term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index fae137dd825..13d751c7b40 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -880,7 +880,8 @@ find_computable_ec_member(PlannerInfo *root,
emvars = pull_var_clause((Node *) em->em_expr,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc2, emvars)
{
if (!list_member(exprvars, lfirst(lc2)))
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c0fcc7d78df..51f7651c727 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3434,12 +3434,15 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
*
* The caller need only supply equality conditions arising from joins;
* this routine automatically adds in any usable baserestrictinfo clauses.
+ * If extra_clauses isn't NULL, it is set to baserestrictinfo clauses which
+ * were used to derive the uniqueness.
* (Note that the passed-in restrictlist will be destructively modified!)
*/
bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist)
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -3495,6 +3498,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -3546,6 +3550,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -3588,7 +3610,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 39711384801..3f88393a04a 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -188,7 +188,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
JOIN_INNER,
restrictlist,
- false);
+ false,
+ NULL);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
@@ -197,7 +198,8 @@ add_paths_to_joinrel(PlannerInfo *root,
innerrel,
jointype,
restrictlist,
- false);
+ false,
+ NULL);
break;
}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5bc16c4bfc7..7b42f5ed355 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,12 +31,30 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_removal;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
@@ -43,14 +62,16 @@ static void remove_rel_from_eclass(EquivalenceClass *ec,
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -261,8 +282,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
continue; /* not mergejoinable */
/*
- * Check if the clause has the form "outer op inner" or "inner op
- * outer", and if so mark which side is inner.
+ * Check if clause has the form "outer op inner" or "inner op outer",
+ * and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, sjinfo->min_lefthand,
innerrel->relids))
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,69 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Substitute newId by oldId in relids.
+ * Works almost like the adjust_relid_set, but process negative newrelid case.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+static Bitmapset *
+replace_relid(Relids relids, int oldrelid, int newrelid)
+{
+ if (IS_SPECIAL_VARNO(oldrelid))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newrelid))
+ {
+ relids = bms_copy(relids);
+ return bms_del_member(relids, oldrelid);
+ }
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldrelid, relids))
+ {
+ /* Ensure we have a modifiable copy */
+ relids = bms_copy(relids);
+
+ /* Remove old, add new */
+ relids = bms_del_member(relids, oldrelid);
+ return bms_add_member(relids, newrelid);
+ }
+
+ return relids;
+}
+
+/*
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -331,30 +385,23 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
SpecialJoinInfo *sjinf = (SpecialJoinInfo *) lfirst(l);
- /*
- * initsplan.c is fairly cavalier about allowing SpecialJoinInfos'
- * lefthand/righthand relid sets to be shared with other data
- * structures. Ensure that we don't modify the original relid sets.
- * (The commute_xxx sets are always per-SpecialJoinInfo though.)
- */
- sjinf->min_lefthand = bms_copy(sjinf->min_lefthand);
- sjinf->min_righthand = bms_copy(sjinf->min_righthand);
- sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
- sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, ojrelid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, ojrelid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, ojrelid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, ojrelid, subst);
+
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = replace_relid(sjinf->commute_above_l, ojrelid, subst);
+ sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, ojrelid, subst);
+ sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, ojrelid, subst);
+ sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, ojrelid, subst);
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
/*
@@ -375,10 +422,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +435,74 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+
+ if (otherrel == NULL)
+ continue;
+
+ /* Update lateral_vars list. */
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ List *joininfos;
+ int ojrelid = sjinfo->ojrelid;
+ RelOptInfo *rel = find_base_rel(root, relid);
+ Relids join_plus_commute;
+ Relids joinrelids;
+ Index rti;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -770,7 +870,7 @@ reduce_unique_semijoins(PlannerInfo *root)
/* Test whether the innerrel is unique for those clauses. */
if (!innerrel_is_unique(root,
joinrelids, sjinfo->min_lefthand, innerrel,
- JOIN_SEMI, restrictlist, true))
+ JOIN_SEMI, restrictlist, true, NULL))
continue;
/* OK, remove the SpecialJoinInfo from the list. */
@@ -844,9 +944,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -862,7 +967,8 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
* relation_has_unique_index_for automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1166,7 +1272,8 @@ distinct_col_search(int colno, List *colnos, List *opids)
* In particular, the force_cache argument allows overriding the internal
* heuristic about whether to cache negative answers; it should be "true"
* if making an inquiry that is not part of the normal bottom-up join search
- * sequence.
+ * sequence. If extra_clauses isn't NULL, it is set to baserestrictinfo
+ * clauses which were used to derive the uniqueness.
*/
bool
innerrel_is_unique(PlannerInfo *root,
@@ -1175,13 +1282,22 @@ innerrel_is_unique(PlannerInfo *root,
RelOptInfo *innerrel,
JoinType jointype,
List *restrictlist,
- bool force_cache)
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
- /* Certainly can't prove uniqueness when there are no joinclauses */
- if (restrictlist == NIL)
+ /*
+ * Don't waste cycles attempting to prove uniqueness when there are no
+ * join clauses. Still, try to do it in an attempt to remove a join
+ * because a degenerate case can take place and may be profitable for the
+ * performance.
+ */
+ if (restrictlist == NIL && !self_join)
return false;
/*
@@ -1193,17 +1309,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1347,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1361,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1416,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1447,883 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *outer = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *inner = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ *
+ * This call should work symmetrically for inner and outer.
+ * Switching places for inner and outer here could only result in
+ * having "x2 = x1" clause instead of "x1 = x2".
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ outer->relids,
+ inner, NULL);
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, outer->relid, inner->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique(root, joinrelids, outer->relids,
+ inner, JOIN_INNER, selfjoinquals,
+ false, &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated
+ * with a index. We already proven that based on
+ * {outer->baserestrictinfo, selfjoinquals} comparsion with unique
+ * index we can guarantees that for the outer rel there is at most
+ * one row where columns equal given values. Sometimes that is not
+ * ok. e.g. "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the
+ * index is based on column(a,b). in these scarenio, we also need
+ * to validate equality of outer->baserestrictinfo with
+ * inner->outer->baserestrictinfo, or else we won't match the same
+ * row on each side of the join.
+ */
+ if (!match_unique_clauses(root, outer, uclauses, inner->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, inner, outer, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be
+ * removed, and these relations should not have TABLESAMPLE
+ * clauses specified. Removing a relation with TABLESAMPLE clause
+ * could potentially change the syntax of the query. Because of
+ * UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be
+ * eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_removal || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3e..075d36c7ecc 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -230,6 +230,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a0baf6d4a18..0b525909d47 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -662,14 +662,17 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
- * be an RTE corresponding to each setop's output.
+ * be an RTE corresponding to each setop's output. Note, we use this not
+ * subquery's targetlist but subroot->parse's targetlist, because it was
+ * revised by self-join removal. subquery's targetlist might contain the
+ * references to the removed relids.
*/
if (pNumGroups)
{
@@ -682,7 +685,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee26..85944988549 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1796,7 +1796,8 @@ create_unique_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
if (rel->rtekind == RTE_RELATION && sjinfo->semi_can_btree &&
relation_has_unique_index_for(root, rel, NIL,
sjinfo->semi_rhs_exprs,
- sjinfo->semi_operators))
+ sjinfo->semi_operators,
+ NULL))
{
pathnode->umethod = UNIQUE_PATH_NOOP;
pathnode->path.rows = rel->rows;
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 8f90afb3269..bf9ddf64e25 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -548,6 +548,8 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not.
+ * (see ChangeVarNodesExtended)
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -558,6 +560,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -590,7 +593,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -637,6 +640,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * For SJE (self join elimination), changing varnos will make t1.a
+ * = t2.a to "t1.a = t1.a", which is always true, we can optizmied
+ * it out. To optimzie it out, we use equal to validate, but we
+ * also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -670,32 +742,30 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
(void *) context);
}
+/*
+ * Similar to ChangeVarNodes, but has the following difference. When
+ * ChangeVarNodes is applied to some noe, it is also applied to all of its
+ * underlying nodes. ChangeVarNodesExtended implements additional parameter
+ * change_RangeTblRef, which SJE (self-join elimination) uses to skip changing
+ * RangeTblRef. During SJE RangeTblRef's are removed during the last step:
+ * remove_rel_from_joinlist(). If we update RangeTblRef's here,
+ * remove_rel_from_joinlist() would have an error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -706,7 +776,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -724,6 +793,11 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
else
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
/*
* Substitute newrelid for oldrelid in a Relid set
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8a67f01200c..99212aeec8b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_removal", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_removal,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index add0f9e45fc..1e9d3a0174d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -734,7 +734,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -973,7 +973,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3443,4 +3443,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 93e3dc719da..5e69cec7569 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -191,6 +191,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d44013..cccf3ea9916 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -71,7 +71,8 @@ extern void generate_partitionwise_join_paths(PlannerInfo *root,
extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
- List *exprlist, List *oprlist);
+ List *exprlist, List *oprlist,
+ List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 93137261e48..28baf7c7962 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_removal;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -111,7 +112,9 @@ extern bool query_supports_distinctness(Query *query);
extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
- JoinType jointype, List *restrictlist, bool force_cache);
+ JoinType jointype, List *restrictlist, bool force_cache,
+ List **extra_clauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049e80..e556035be6b 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -46,6 +47,8 @@ extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index a328164fe0f..f328e92b082 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -430,6 +430,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9b2973694ff..f9961f0824d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6293,6 +6293,1085 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+ QUERY PLAN
+----------------------------------------------------------------
+ Nested Loop Semi Join
+ Join Filter: (am.amname = c2.relname)
+ -> Seq Scan on pg_am am
+ -> Materialize
+ -> Index Scan using pg_class_oid_index on pg_class c2
+ Index Cond: (oid < '10'::oid)
+(6 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is derived from
+-- baserestrictinfo clauses.
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Two approaches to write a join. In both cases it is sel-join which addresses
+-- the same physical tuple and can be removed from the join search space.
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1, sj j2 WHERE j1.a = 3 AND j2.a = 3;
+ QUERY PLAN
+-------------------
+ Seq Scan on sj j2
+ Filter: (a = 3)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1 JOIN sj j2 ON (j1.a = 3 AND j2.a = 3);
+ QUERY PLAN
+-------------------
+ Seq Scan on sj j2
+ Filter: (a = 3)
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+-------------------------------------------
+ HashSetOp Except All
+ -> Append
+ -> Subquery Scan on "*SELECT* 1"
+ -> Seq Scan on emp1 c2
+ -> Subquery Scan on "*SELECT* 2"
+ -> Seq Scan on emp1 c3
+(6 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e0..7fcc07c20c3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -167,10 +167,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_removal | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d01..7fc2159349b 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 4c9c3e9f49b..689c762b4d6 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2379,6 +2379,496 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of Lateral links from top-level query to the removing relation
+explain (COSTS OFF)
+SELECT * FROM pg_am am WHERE am.amname IN (
+ SELECT c1.relname AS relname
+ FROM pg_class c1
+ JOIN pg_class c2
+ ON c1.oid=c2.oid AND c1.oid < 10
+);
+
+--
+-- SJE corner case: uniqueness of an inner is derived from
+-- baserestrictinfo clauses.
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Two approaches to write a join. In both cases it is sel-join which addresses
+-- the same physical tuple and can be removed from the join search space.
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1, sj j2 WHERE j1.a = 3 AND j2.a = 3;
+EXPLAIN (COSTS OFF) SELECT * FROM sj j1 JOIN sj j2 ON (j1.a = 3 AND j2.a = 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2b..d38dd981b6a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2575,6 +2575,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4023,6 +4024,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5 (Apple Git-154)
On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
We initially didn't use ChangeVarNodes() in SJE at all. See the last
patch version without it [1]. We're trying to address Tom Lane's
proposal to re-use more of existing tree-manipulation infrastructure
[2]. I agree with you that the case with ChangeVarNodes() looks
questionable. Do you have other ideas how we can re-use some more of
existing tree-manipulation infrastructure in SJE?
I think there was one idea to be considered. Given that this feature
has fragility to changes in PlannerInfo and underlying structures,
Andrei proposed to generate a walker over PlannerInfo with a Perl
script. I spent some time analyzing this idea. I came to the
following conclusions.
It seems feasible to generate a function that would walk over all
PlannerInfo fields recursively. Perhaps some more meta-information is
needed, at least, to check whether we should visit a particular
pointer field. But that could be provided by additional
pg_node_attr(). With this approach, we would need to sacrifice some
efficiency (e.g., processing all the EquivalenceClasses instead of
picking only the ones used in the relation to be removed). Also, the
logic in remove_self_join_rel() is more complex than processing all
the Relids. That is, we would still need a massive number of
if-branches specifying how we handle each node type. It doesn't look
like we would end up with a more simple or less error-prone
implementation.
We may decide to generate not just a walker but most of the logic in
remove_self_join_rel(). This is probably possible by injecting way
more meta-information into definitions of structures. That isn't
going to be simpler than the current approach. But it is probably
less error-prone: one could realize that if you add a new field to the
structure, it should have a similar pg_node_attr() as surroundings.
But I am afraid that if we go this way, we may end up with an awkward
heap of pg_node_attr() to generate the functionality of
remove_self_join_rel(). Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?
Any thoughts?
Links
1. /messages/by-id/64486b0b-0404-e39e-322d-0801154901f3@postgrespro.ru
------
Regards,
Alexander Korotkov
Supabase
On 9/12/2024 13:03, Alexander Korotkov wrote:
remove_self_join_rel(). Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?Any thoughts?
As I see, it is quite typical to keep two parts of the code in sync by
mentioning them in comments (see reparameterisation stuff, for example).
This would reduce the code needed to implement the feature.
--
regards, Andrei Lepikhov
On 12/9/24 13:03, Alexander Korotkov wrote:
On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
We may decide to generate not just a walker but most of the logic in
remove_self_join_rel(). This is probably possible by injecting way
more meta-information into definitions of structures. That isn't
going to be simpler than the current approach. But it is probably
less error-prone: one could realize that if you add a new field to the
structure, it should have a similar pg_node_attr() as surroundings.
But I am afraid that if we go this way, we may end up with an awkward
heap of pg_node_attr() to generate the functionality of
remove_self_join_rel(). Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?Any thoughts?
Observing positive cases caused by the SJE feature, I resumed the work
and, following Alexander's suggestion, added developer comments on
checking remove_self_join_rel in case of the PlannerInfo changes.
I see now that it helps apply after pull-up transformations and,
sometimes, because of clauses derived from EquivalenceClasses (see the
patch comment for examples). So, it is not only about dumb ORM-generated
queries.
Also, multiple code changes since the v7 version caused corresponding
changes in the SJE code on rebase. So, the new version provided in the
attachment needs a fresh review.
--
regards, Andrei Lepikhov
Attachments:
v8-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v8-0001-Remove-useless-self-joins.patchDownload
From 3847fd34df918166b8b9c24547cc7683a847d76e Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 20 Dec 2024 15:55:09 +0700
Subject: [PATCH v8] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
improves total planner prediction for the query.
This feature is dedicated to avoiding redundancy which can appear after pull-up
transformations or the creation of an EquivalenceClass-derived clause like
the below:
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x=t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x=t2.x AND t2.x=t3.x;
After some additional coding, it would potentially reduce redundancy caused by
subquery pull-up after unnecessary outer join removal. Something like that
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x=t1.x);
Also, it can drastically help to join partitioned tables, removing entries even
before their expansion.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1238 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 134 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1062 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 483 ++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2965 insertions(+), 139 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fbdd6ce574..579481cea8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5678,6 +5678,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index fae137dd82..510b4465d2 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5d102a0d37..4fd1fc314a 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4149,6 +4149,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4204,6 +4220,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4255,6 +4272,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4297,7 +4332,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5bc16c4bfc..89a590a033 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,47 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +357,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +404,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +417,111 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +584,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +599,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +678,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = replace_relid(ec->ec_relids, relid, subst);
+ ec->ec_relids = replace_relid(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +696,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = replace_relid(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = replace_relid(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +711,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +918,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +938,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1256,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1297,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1335,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1349,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1404,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1435,898 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the self
+ * join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 735560e8ca..2916b36e30 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 698ef601be..2155fd487b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,7 +639,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -647,6 +647,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index f4e687c986..f23ee82992 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -535,6 +535,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (IS_SPECIAL_VARNO(oldId))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newId))
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -543,6 +570,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -553,6 +581,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -585,7 +614,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -632,6 +661,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -664,32 +762,31 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -700,7 +797,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -718,6 +814,12 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
* Substitute newrelid for oldrelid in a Relid set
*
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8cf1afbad2..91364c3709 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 58748d2ca6..3b2a4dbcb6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -185,6 +185,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -734,7 +739,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -973,7 +978,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3444,4 +3449,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 2e123e08b7..b1f83fafdc 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d4401..64ba89fcb8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0b6f0f7969..93510cce13 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index ac6d2049e8..074847f1a4 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 5622750500..ad8ab294ff 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0c9b312eaf..c07f98b1a8 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6379,6 +6379,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..4c8c5c953c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d0..7fc2159349 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 8cfc1053cb..8f8cc2163f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2409,6 +2409,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index fbdb932e6b..ae0b36f3b8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -391,6 +391,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2582,6 +2583,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4034,6 +4036,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5
On Mon, Dec 23, 2024 at 10:25 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 12/9/24 13:03, Alexander Korotkov wrote:
On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
We may decide to generate not just a walker but most of the logic in
remove_self_join_rel(). This is probably possible by injecting way
more meta-information into definitions of structures. That isn't
going to be simpler than the current approach. But it is probably
less error-prone: one could realize that if you add a new field to the
structure, it should have a similar pg_node_attr() as surroundings.
But I am afraid that if we go this way, we may end up with an awkward
heap of pg_node_attr() to generate the functionality of
remove_self_join_rel(). Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?Any thoughts?
Observing positive cases caused by the SJE feature, I resumed the work
and, following Alexander's suggestion, added developer comments on
checking remove_self_join_rel in case of the PlannerInfo changes.I see now that it helps apply after pull-up transformations and,
sometimes, because of clauses derived from EquivalenceClasses (see the
patch comment for examples). So, it is not only about dumb ORM-generated
queries.Also, multiple code changes since the v7 version caused corresponding
changes in the SJE code on rebase. So, the new version provided in the
attachment needs a fresh review.
I've got an off-list report from Alexander Lakhin. The failing query
is added to the regression tests in the revised patch. The query was
failing with an error "negative bitmapset member not allowed" issued
in adjust_relid_set(). In order to fix that I've to teach
adjust_relid_set() about negative newrelid, it became even more
similar to replace_relid(). Notable this error happens in
remove_leftjoinrel_from_query(). This seems to be consequence that we
made self-join removal and left-joins removal use common
infrastructure.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v9-0001-Remove-useless-self-joins.patchapplication/octet-stream; name=v9-0001-Remove-useless-self-joins.patchDownload
From b544d94a2d4260d01394e6cd65e88a2b517382c1 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 20 Dec 2024 15:55:09 +0700
Subject: [PATCH v9] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
improves total planner prediction for the query.
This feature is dedicated to avoiding redundancy which can appear after pull-up
transformations or the creation of an EquivalenceClass-derived clause like
the below:
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x=t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x=t2.x AND t2.x=t3.x;
After some additional coding, it would potentially reduce redundancy caused by
subquery pull-up after unnecessary outer join removal. Something like that
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x=t1.x);
Also, it can drastically help to join partitioned tables, removing entries even
before their expansion.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1238 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 137 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1062 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 26 +
src/test/regress/sql/join.sql | 483 ++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2977 insertions(+), 140 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1ab614575a..ad2ab5745c0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5680,6 +5680,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7cafaca33c5..0f9ecf5ee8b 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f428e835b0..0209a12c122 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4149,6 +4149,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4204,6 +4220,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4255,6 +4272,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4297,7 +4332,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b33fc671775..96817bf3dfb 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,47 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +357,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +404,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +417,111 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +584,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +599,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +678,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = replace_relid(ec->ec_relids, relid, subst);
+ ec->ec_relids = replace_relid(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +696,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = replace_relid(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = replace_relid(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +711,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +918,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +938,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1256,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1297,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1335,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1349,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1404,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1435,898 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the self
+ * join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d56..5467e094ca7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7c27dc24e21..af9364e956a 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,7 +639,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -647,6 +647,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 047396e390b..d0598fa3cdc 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -535,6 +535,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (IS_SPECIAL_VARNO(oldId))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newId))
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -543,6 +570,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -553,6 +581,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -585,7 +614,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -632,6 +661,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -664,32 +762,31 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -700,7 +797,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -718,6 +814,12 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
* Substitute newrelid for oldrelid in a Relid set
*
@@ -735,7 +837,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
- relids = bms_add_member(relids, newrelid);
+ if (!IS_SPECIAL_VARNO(newrelid))
+ relids = bms_add_member(relids, newrelid);
}
return relids;
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c9d8cd796a8..80865401137 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -986,6 +986,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 54ee17697e5..1ac9c55a741 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -185,6 +185,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -734,7 +739,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -973,7 +978,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3444,4 +3449,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 734c82a27db..a3bcc51d1db 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f0..bc5dfd7db41 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe3..5a930199611 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index 1070b93a9d3..e327bc0c5fe 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 56227505009..ad8ab294ff6 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 079fcf46f0d..2a9349f8774 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6380,6 +6380,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215f..4c8c5c953cb 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d01..ea9bda92301 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
@@ -303,3 +319,13 @@ explain (costs off)
select * from tbl_nocom t1 full join tbl_nocom t2 on t2.a = t1.b;
abort;
+
+explain (costs off)
+select subq_0.c0 as c8 from
+ (select case when true then ref_0.phase else ref_0.phase end as c0
+ from
+ pg_catalog.pg_stat_progress_cluster as ref_0
+ right join public.ec1 as sample_0
+ on true
+ ) as subq_0
+ right join information_schema.routines on true;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 779d56cb30f..60cd0bb65eb 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2409,6 +2409,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eb93debe108..b405c5b8eea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -392,6 +392,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2584,6 +2585,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4036,6 +4038,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5 (Apple Git-154)
On 1/10/25 10:13, Alexander Korotkov wrote:
I've got an off-list report from Alexander Lakhin. The failing query
is added to the regression tests in the revised patch. The query was
failing with an error "negative bitmapset member not allowed" issued
in adjust_relid_set(). In order to fix that I've to teach
adjust_relid_set() about negative newrelid, it became even more
similar to replace_relid(). Notable this error happens in
remove_leftjoinrel_from_query(). This seems to be consequence that we
made self-join removal and left-joins removal use common
infrastructure.
I have analysed your change. It pertains to the situation where PHV
references both the outer join and its inner join in phrels but actually
utilises only the outer side of the join.
I agree with your proposed fix and have just rewritten the query to make
it simpler and more consistent with the regression tests (see new
version of the patch in attachment).
Additionally, I noticed that the remove_useless_joins code uses the term
"join removal" in the join.sql tests. I wonder if it might be more
appropriate to revert the name from "Self Join Elimination" back to
"Self Join Removal."
--
regards, Andrei Lepikhov
Attachments:
v10-0001-Remove-useless-self-joins.patchtext/x-patch; charset=UTF-8; name=v10-0001-Remove-useless-self-joins.patchDownload
From 88e2cc7f8d76b32841176c07c69edf0611df5c9f Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 20 Dec 2024 15:55:09 +0700
Subject: [PATCH v10] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
improves total planner prediction for the query.
This feature is dedicated to avoiding redundancy which can appear after pull-up
transformations or the creation of an EquivalenceClass-derived clause like
the below:
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x=t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x=t2.x AND t2.x=t3.x;
After some additional coding, it would potentially reduce redundancy caused by
subquery pull-up after unnecessary outer join removal. Something like that
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x=t1.x);
Also, it can drastically help to join partitioned tables, removing entries even
before their expansion.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1238 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 137 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1081 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 494 ++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2997 insertions(+), 140 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a782f10998..230f2902a0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5538,6 +5538,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7cafaca33c..0f9ecf5ee8 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index fa3edf60f3..65dc234d1f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4149,6 +4149,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4204,6 +4220,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4255,6 +4272,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4297,7 +4332,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b33fc67177..96817bf3df 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,47 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = replace_relid(root->all_baserels, relid, subst);
+ root->outer_join_rels = replace_relid(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, relid, subst);
+ root->all_query_rels = replace_relid(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +357,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = replace_relid(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = replace_relid(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = replace_relid(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +404,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +417,111 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = replace_relid(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = replace_relid(phv->phrels, relid, subst);
+ phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +584,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +599,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +678,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = replace_relid(ec->ec_relids, relid, subst);
+ ec->ec_relids = replace_relid(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +696,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = replace_relid(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = replace_relid(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +711,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +918,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +938,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1256,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1297,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1335,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1349,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1404,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1435,898 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = replace_relid(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = replace_relid(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = replace_relid(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the self
+ * join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = replace_relid(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+ replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d5..5467e094ca 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7c27dc24e2..af9364e956 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,7 +639,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -647,6 +647,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index a115b217c9..79116d5fc8 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -536,6 +536,33 @@ offset_relid_set(Relids relids, int offset)
return result;
}
+/*
+ * Substitute newId by oldId in relids.
+ *
+ * We must make a copy of the original Bitmapset before making any
+ * modifications, because the same pointer to it might be shared among
+ * different places.
+ * Also, this function can be used in 'delete only' mode (newId < 0).
+ * It allows us to utilise the same code in the remove_useless_joins and the
+ * remove_self_joins features.
+ */
+Bitmapset *
+replace_relid(Relids relids, int oldId, int newId)
+{
+ if (IS_SPECIAL_VARNO(oldId))
+ return relids;
+
+ /* Delete relid without substitution. */
+ if (IS_SPECIAL_VARNO(newId))
+ return bms_del_member(bms_copy(relids), oldId);
+
+ /* Substitute newId for oldId. */
+ if (bms_is_member(oldId, relids))
+ return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
+
+ return relids;
+}
+
/*
* ChangeVarNodes - adjust Var nodes for a specific change of RT index
*
@@ -544,6 +571,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -554,6 +582,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -586,7 +615,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -633,6 +662,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ replace_relid(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ replace_relid(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ replace_relid(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ replace_relid(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ replace_relid(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ replace_relid(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -665,32 +763,31 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -701,7 +798,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -719,6 +815,12 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
* Substitute newrelid for oldrelid in a Relid set
*
@@ -736,7 +838,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
- relids = bms_add_member(relids, newrelid);
+ if (!IS_SPECIAL_VARNO(newrelid))
+ relids = bms_add_member(relids, newrelid);
}
return relids;
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 38cb9e970d..c80273a521 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -987,6 +987,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 52d44f4302..404c0c4a1f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -188,6 +188,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -740,7 +745,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -979,7 +984,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3450,4 +3455,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index bcf8ed645c..4c98437286 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f..bc5dfd7db4 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe..5a93019961 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index 512823033b..f0aaa67869 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *replace_relid(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 5622750500..ad8ab294ff 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 079fcf46f0..775036b264 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5969,6 +5969,25 @@ select c.id, ss.a from c
-> Seq Scan on c
(7 rows)
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on b
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on b b_1
+ -> Materialize
+ -> Seq Scan on a t1
+(7 rows)
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
-- test join removals on a partitioned table
@@ -6380,6 +6399,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd4..83228cfca2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d0..7fc2159349 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 779d56cb30..68b5529e8f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2175,6 +2175,17 @@ select c.id, ss.a from c
left join (select d.a from onerow, d left join b on d.a = b.id) ss
on c.id = ss.a;
+-- check the case when placeholder relates on an outer join and its inner in the
+-- phrels field but actually uses only outer side of the join.
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
@@ -2409,6 +2420,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a2644a2e65..a734ba5098 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -394,6 +394,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2592,6 +2593,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4048,6 +4050,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5
Hi Andrei!
On Fri, Jan 31, 2025 at 6:57 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 1/10/25 10:13, Alexander Korotkov wrote:
I've got an off-list report from Alexander Lakhin. The failing query
is added to the regression tests in the revised patch. The query was
failing with an error "negative bitmapset member not allowed" issued
in adjust_relid_set(). In order to fix that I've to teach
adjust_relid_set() about negative newrelid, it became even more
similar to replace_relid(). Notable this error happens in
remove_leftjoinrel_from_query(). This seems to be consequence that we
made self-join removal and left-joins removal use common
infrastructure.I have analysed your change. It pertains to the situation where PHV
references both the outer join and its inner join in phrels but actually
utilises only the outer side of the join.I agree with your proposed fix and have just rewritten the query to make
it simpler and more consistent with the regression tests (see new
version of the patch in attachment).Additionally, I noticed that the remove_useless_joins code uses the term
"join removal" in the join.sql tests. I wonder if it might be more
appropriate to revert the name from "Self Join Elimination" back to
"Self Join Removal."
Great, thank you.
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().
Also, I did some grammar correction to your new comment in tests.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v11-0001-Remove-useless-self-joins.patchapplication/octet-stream; name=v11-0001-Remove-useless-self-joins.patchDownload
From 88e1344f65bbf97cefee860ba211d0134b8ddbdc Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 20 Dec 2024 15:55:09 +0700
Subject: [PATCH v11] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
improves total planner prediction for the query.
This feature is dedicated to avoiding redundancy which can appear after pull-up
transformations or the creation of an EquivalenceClass-derived clause like
the below:
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x=t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x=t2.x AND t2.x=t3.x;
After some additional coding, it would potentially reduce redundancy caused by
subquery pull-up after unnecessary outer join removal. Something like that
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x=t1.x);
Also, it can drastically help to join partitioned tables, removing entries even
before their expansion.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1238 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 7 +-
src/backend/rewrite/rewriteManip.c | 113 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1083 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 494 ++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2973 insertions(+), 142 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 38244409e3c..5e7423bacba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5538,6 +5538,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7cafaca33c5..0f9ecf5ee8b 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6e2051efc65..a43ca16d683 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4162,6 +4162,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b33fc671775..abacc8fe5fb 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,47 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
+ root->outer_join_rels = adjust_relid_set(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +357,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = adjust_relid_set(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = adjust_relid_set(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = adjust_relid_set(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = adjust_relid_set(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +404,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +417,111 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
+ phv->phrels = adjust_relid_set(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +584,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +599,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +678,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +696,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +711,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +918,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +938,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1256,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1297,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1335,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1349,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1404,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1435,898 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = adjust_relid_set(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = adjust_relid_set(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the self
+ * join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = adjust_relid_set(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ adjust_relid_set(root->all_result_relids, toRemove->relid, toKeep->relid);
+ adjust_relid_set(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated with a index.
+ * We already proven that based on {outer->baserestrictinfo, selfjoinquals}
+ * comparsion with unique index we can guarantees that for the outer rel there is
+ * at most one row where columns equal given values. Sometimes that is not ok. e.g.
+ * "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the index is based on column(a,b).
+ * in these scarenio, we also need to validate equality of outer->baserestrictinfo
+ * with inner->outer->baserestrictinfo,
+ * or else we won't match the same row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be removed,
+ * and these relations should not have TABLESAMPLE clauses
+ * specified. Removing a relation with TABLESAMPLE clause could
+ * potentially change the syntax of the query.
+ * Because of UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d56..5467e094ca7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7c27dc24e21..af9364e956a 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,7 +639,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
@@ -647,6 +647,9 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
+ * Note, we use this not subquery's targetlist but subroot->parse's
+ * targetlist, because it was revised by self-join removal. subquery's
+ * targetlist might contain the references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index a115b217c91..6d56b65a4b9 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -64,7 +64,6 @@ static bool locate_windowfunc_walker(Node *node,
locate_windowfunc_context *context);
static bool checkExprHasSubLink_walker(Node *node, void *context);
static Relids offset_relid_set(Relids relids, int offset);
-static Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
static Node *add_nulling_relids_mutator(Node *node,
add_nulling_relids_context *context);
static Node *remove_nulling_relids_mutator(Node *node,
@@ -544,6 +543,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -554,6 +554,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -586,7 +587,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -633,6 +634,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a = t2.a
+ * to "t1.a = t1.a", which is always true, we can optizmied it out.
+ * to optimzied it out, we use equal to validate,
+ * but we also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -665,32 +735,31 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -701,7 +770,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -719,6 +787,12 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
* Substitute newrelid for oldrelid in a Relid set
*
@@ -727,7 +801,7 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
* (Perhaps newrelid could also be a special varno, but there had better
* not be a reason to inject that into a nullingrels or phrels set.)
*/
-static Relids
+Relids
adjust_relid_set(Relids relids, int oldrelid, int newrelid)
{
if (!IS_SPECIAL_VARNO(oldrelid) && bms_is_member(oldrelid, relids))
@@ -736,7 +810,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
- relids = bms_add_member(relids, newrelid);
+ if (!IS_SPECIAL_VARNO(newrelid))
+ relids = bms_add_member(relids, newrelid);
}
return relids;
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ce7534d4d23..4c19b4b2b6f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -987,6 +987,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 00c700cc3e7..fbf05322c75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -201,6 +201,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -753,7 +758,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -992,7 +997,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3463,4 +3468,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index bcf8ed645c2..4c984372869 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f0..bc5dfd7db41 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe3..5a930199611 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index 512823033b9..7371ba3bbf0 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *adjust_relid_set(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 56227505009..ad8ab294ff6 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 3ffc066b1f8..a57bb18c24f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5966,6 +5966,27 @@ select c.id, ss.a from c
-> Seq Scan on c
(7 rows)
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on b
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on b b_1
+ -> Materialize
+ -> Seq Scan on a t1
+(7 rows)
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
-- test join removals on a partitioned table
@@ -6377,6 +6398,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd42..83228cfca29 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d01..7fc2159349b 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c7349eab933..c29d13b9fed 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2175,6 +2175,17 @@ select c.id, ss.a from c
left join (select d.a from onerow, d left join b on d.a = b.id) ss
on c.id = ss.a;
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
@@ -2409,6 +2420,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a3bee93dec..6fd426ab00c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -394,6 +394,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2592,6 +2593,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4050,6 +4052,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5 (Apple Git-154)
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().
Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.
Also, I did some grammar correction to your new comment in tests.
Thanks!
--
regards, Andrei Lepikhov
Attachments:
comment.difftext/plain; charset=UTF-8; name=comment.diffDownload
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index 6d56b65a4b..38c3e4e225 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -794,12 +794,10 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
}
/*
- * Substitute newrelid for oldrelid in a Relid set
- *
+ * Remove oldrelid from a Relid set and replace it with newrelid in case it is
+ * not a special varno.
* Note: some extensions may pass a special varno such as INDEX_VAR for
* oldrelid. bms_is_member won't like that, but we should tolerate it.
- * (Perhaps newrelid could also be a special varno, but there had better
- * not be a reason to inject that into a nullingrels or phrels set.)
*/
Relids
adjust_relid_set(Relids relids, int oldrelid, int newrelid)
Hi!
On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.Also, I did some grammar correction to your new comment in tests.
Thanks!
I've further revised adjust_relid_set() header comment.
Looking back to the work done since previous attempt to commit this to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yet the
current way still feels to be best possible.
3) For sure, several bugs were fixed.
I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v12-0001-Remove-useless-self-joins.patchapplication/octet-stream; name=v12-0001-Remove-useless-self-joins.patchDownload
From 4f7cc228092a6a1cf86d867971cd91c219b21832 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 20 Dec 2024 15:55:09 +0700
Subject: [PATCH v12] Remove useless self-joins
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
improves total planner prediction for the query.
This feature is dedicated to avoiding redundancy which can appear after pull-up
transformations or the creation of an EquivalenceClass-derived clause like
the below:
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x=t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x=t2.x AND t2.x=t3.x;
After some additional coding, it would potentially reduce redundancy caused by
subquery pull-up after unnecessary outer join removal. Something like that
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x=t1.x);
Also, it can drastically help to join partitioned tables, removing entries even
before their expansion.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1243 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 9 +-
src/backend/rewrite/rewriteManip.c | 124 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1083 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 494 ++++++++
src/tools/pgindent/typedefs.list | 3 +
19 files changed, 2985 insertions(+), 148 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 38244409e3c..5e7423bacba 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5538,6 +5538,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7cafaca33c5..0f9ecf5ee8b 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6e2051efc65..a43ca16d683 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4162,6 +4162,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b33fc671775..d0d9b80be9c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,47 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. It is needed to perform a sorting procedure and simplify
+ * the search of SJE-candidate baserels referencing the same database relation.
+ * Having collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts to
+ * remove self-joins.
+ * Preliminary sorting prevents quadratic behaviour that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +109,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +297,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +309,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
+ root->outer_join_rels = adjust_relid_set(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +357,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = adjust_relid_set(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = adjust_relid_set(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = adjust_relid_set(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = adjust_relid_set(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +404,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +417,112 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
+
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
+ phv->phrels = adjust_relid_set(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +585,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +600,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +679,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +697,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +712,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +919,14 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * extra_clauses contains the right sides of baserestrictinfo clauses looking
+ * like x = const if distinctness is derived from such clauses, not joininfo
+ * clause. Pass NULL to the extra_clauses, if its value is not needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +939,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1257,33 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
+ * additional clauses from a baserestrictinfo list that were used to prove
+ * uniqueness. A non NULL 'extra_clauses' indicates that we're checking
+ * for self-join and correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1298,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels, because we need extra clauses to be valid for our case.
+ * Also, for self-join checking we've filtered the clauses list. Thus,
+ * for a self-join search, we can match only the result cached for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1336,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1350,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1405,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1436,902 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list, and updated these clauses to reference the remaining
+ * relation, so we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there are join clauses that join the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler to just
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = adjust_relid_set(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = adjust_relid_set(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the self
+ * join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+ foreach_node(RestrictInfo, rinfo, binfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Const and non-const expressions can't be equal */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ foreach_node(RestrictInfo, rinfo, jinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, toKeep->joininfo)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL
+ && src->parent_ec == rinfo->parent_ec)
+ || restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = adjust_relid_set(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation, because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ adjust_relid_set(root->all_result_relids, toRemove->relid, toKeep->relid);
+ adjust_relid_set(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match, and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate join of two relations if they
+ * belong to different rules of order. Otherwise planner can't be
+ * able to find any variants of correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g. one is
+ * locked FOR UPDATE and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses, required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self join quals out of the
+ * other quals. e.g x = x goes to selfjoinquals and a = b to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * uclauses is the copy of outer->baserestrictinfo that associated
+ * with a index. We already proven that based on
+ * {outer->baserestrictinfo, selfjoinquals} comparsion with unique
+ * index we can guarantees that for the outer rel there is at most
+ * one row where columns equal given values. Sometimes that is not
+ * ok. e.g. "where s1.b = s2.b and s1.a = 1 and s2.a = 2". (the
+ * index is based on column(a,b). in these scarenio, we also need
+ * to validate equality of outer->baserestrictinfo with
+ * inner->outer->baserestrictinfo, or else we won't match the same
+ * row on each side of the join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be
+ * removed, and these relations should not have TABLESAMPLE
+ * clauses specified. Removing a relation with TABLESAMPLE clause
+ * could potentially change the syntax of the query. Because of
+ * UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be
+ * eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d56..5467e094ca7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7c27dc24e21..eab44da65b8 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,14 +639,17 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
- * be an RTE corresponding to each setop's output.
+ * be an RTE corresponding to each setop's output. Note, we use this not
+ * subquery's targetlist but subroot->parse's targetlist, because it was
+ * revised by self-join removal. subquery's targetlist might contain the
+ * references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index a115b217c91..bc8e6c89a6f 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -64,7 +64,6 @@ static bool locate_windowfunc_walker(Node *node,
locate_windowfunc_context *context);
static bool checkExprHasSubLink_walker(Node *node, void *context);
static Relids offset_relid_set(Relids relids, int offset);
-static Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
static Node *add_nulling_relids_mutator(Node *node,
add_nulling_relids_context *context);
static Node *remove_nulling_relids_mutator(Node *node,
@@ -544,6 +543,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -554,6 +554,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -586,7 +587,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -633,6 +634,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a
+ * = t2.a to "t1.a = t1.a", which is always true, we can optizmied
+ * it out. to optimzied it out, we use equal to validate, but we
+ * also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -665,32 +735,31 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -701,7 +770,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -719,15 +787,22 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
- * Substitute newrelid for oldrelid in a Relid set
+ * adjust_relid_set - substitute newrelid for oldrelid in a Relid set
*
- * Note: some extensions may pass a special varno such as INDEX_VAR for
- * oldrelid. bms_is_member won't like that, but we should tolerate it.
- * (Perhaps newrelid could also be a special varno, but there had better
- * not be a reason to inject that into a nullingrels or phrels set.)
+ * Attempt to remove oldrelid from a Relid set (as long as it's not a special
+ * varno). If oldrelid was found and removed, insert newrelid into a Relid
+ * set (as long as it's not a special varno). Therefore, when oldrelid is
+ * a special varno, this function does nothing. When newrelid is a special
+ * varno, this function behaves as delete.
*/
-static Relids
+Relids
adjust_relid_set(Relids relids, int oldrelid, int newrelid)
{
if (!IS_SPECIAL_VARNO(oldrelid) && bms_is_member(oldrelid, relids))
@@ -736,7 +811,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
- relids = bms_add_member(relids, newrelid);
+ if (!IS_SPECIAL_VARNO(newrelid))
+ relids = bms_add_member(relids, newrelid);
}
return relids;
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index ce7534d4d23..4c19b4b2b6f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -987,6 +987,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 00c700cc3e7..fbf05322c75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -201,6 +201,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -753,7 +758,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -992,7 +997,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3463,4 +3468,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index bcf8ed645c2..78e05d88c8e 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f0..bc5dfd7db41 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe3..5a930199611 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index 512823033b9..ddd45099a94 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *adjust_relid_set(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+extern void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 56227505009..ad8ab294ff6 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 3ffc066b1f8..a57bb18c24f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5966,6 +5966,27 @@ select c.id, ss.a from c
-> Seq Scan on c
(7 rows)
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on b
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on b b_1
+ -> Materialize
+ -> Seq Scan on a t1
+(7 rows)
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
-- test join removals on a partitioned table
@@ -6377,6 +6398,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd42..83228cfca29 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d01..7fc2159349b 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c7349eab933..c29d13b9fed 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2175,6 +2175,17 @@ select c.id, ss.a from c
left join (select d.a from onerow, d left join b on d.a = b.id) ss
on c.id = ss.a;
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
@@ -2409,6 +2420,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a3bee93dec..6fd426ab00c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -394,6 +394,7 @@ CatalogId
CatalogIdMapEntry
CatalogIndexState
ChangeVarNodes_context
+ReplaceVarnoContext
CheckPoint
CheckPointStmt
CheckpointStatsData
@@ -2592,6 +2593,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4050,6 +4052,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5 (Apple Git-154)
Hi! Thank you for the work with this subject, I think it is really
important.
On 10.02.2025 22:58, Alexander Korotkov wrote:
Hi!
On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.Also, I did some grammar correction to your new comment in tests.
Thanks!
I've further revised adjust_relid_set() header comment.
Looking back to the work done since previous attempt to commit this to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yet the
current way still feels to be best possible.
3) For sure, several bugs were fixed.I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?
I didn't find any mistakes, I just have a refactoring remark. I think
the part where we add non-redundant expressions with the
binfo_candidates, jinfo_candidates
check can be moved to a separate function, otherwise the code is very
repetitive in this place. I did it and attached diff file
--
Regards,
Alena Rybakina
Postgres Professional
Attachments:
self-join-removal.difftext/x-patch; charset=UTF-8; name=self-join-removal.diffDownload
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0d9b80be9c..d31427693da 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1578,6 +1578,48 @@ restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
return result;
}
+/*
+ * The functions adds all non-redundant clauses to the keeping relation.
+ * Contradictory operation. On the one side, we reduce the length of
+ * restrict lists that can impact planning or executing time.
+ * Additionally, we improve the accuracy of cardinality estimation. On the
+ * other side, it is one more place that can make planning time much
+ * longer in specific cases. It would have been better to avoid calling
+ * the equal() function here, but it's the only way to detect duplicated
+ * inequality expressions.
+ */
+static void
+add_non_redundant_clauses(PlannerInfo *root,
+ List *rinfo_candidates,
+ List *keep_rinfo_list,
+ Index removed_relid)
+{
+ foreach_node(RestrictInfo, rinfo, rinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(removed_relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, keep_rinfo_list)
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL &&
+ src->parent_ec == rinfo->parent_ec) ||
+ restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+}
+
/*
* Remove a relation after we have proven that it participates only in an
* unneeded unique self join.
@@ -1654,62 +1696,10 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
/*
* Now, add all non-redundant clauses to the keeping relation.
- * Contradictory operation. On the one side, we reduce the length of
- * restrict lists that can impact planning or executing time.
- * Additionally, we improve the accuracy of cardinality estimation. On the
- * other side, it is one more place that can make planning time much
- * longer in specific cases. It would have been better to avoid calling
- * the equal() function here, but it's the only way to detect duplicated
- * inequality expressions.
*/
- foreach_node(RestrictInfo, rinfo, binfo_candidates)
- {
- bool is_redundant = false;
-
- Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
-
- foreach_node(RestrictInfo, src, toKeep->baserestrictinfo)
- {
- if (!bms_equal(src->clause_relids, rinfo->clause_relids))
- /* Const and non-const expressions can't be equal */
- continue;
-
- if (src == rinfo ||
- (rinfo->parent_ec != NULL
- && src->parent_ec == rinfo->parent_ec)
- || restrict_infos_logically_equal(rinfo, src))
- {
- is_redundant = true;
- break;
- }
- }
- if (!is_redundant)
- distribute_restrictinfo_to_rels(root, rinfo);
- }
- foreach_node(RestrictInfo, rinfo, jinfo_candidates)
- {
- bool is_redundant = false;
-
- Assert(!bms_is_member(toRemove->relid, rinfo->required_relids));
+ add_non_redundant_clauses(root, binfo_candidates, toKeep->baserestrictinfo, toRemove->relid);
+ add_non_redundant_clauses(root, jinfo_candidates, toKeep->joininfo, toRemove->relid);
- foreach_node(RestrictInfo, src, toKeep->joininfo)
- {
- if (!bms_equal(src->clause_relids, rinfo->clause_relids))
- /* Can't compare trivially different clauses */
- continue;
-
- if (src == rinfo ||
- (rinfo->parent_ec != NULL
- && src->parent_ec == rinfo->parent_ec)
- || restrict_infos_logically_equal(rinfo, src))
- {
- is_redundant = true;
- break;
- }
- }
- if (!is_redundant)
- distribute_restrictinfo_to_rels(root, rinfo);
- }
list_free(binfo_candidates);
list_free(jinfo_candidates);
Hi!
On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
Hi! Thank you for the work with this subject, I think it is really
important.On 10.02.2025 22:58, Alexander Korotkov wrote:
Hi!
On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.Also, I did some grammar correction to your new comment in tests.
Thanks!
I've further revised adjust_relid_set() header comment.
Looking back to the work done since previous attempt to commit this to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yet the
current way still feels to be best possible.
3) For sure, several bugs were fixed.I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?I didn't find any mistakes, I just have a refactoring remark. I think
the part where we add non-redundant expressions with the
binfo_candidates, jinfo_candidates
check can be moved to a separate function, otherwise the code is very
repetitive in this place. I did it and attached diff file
Thank you. I've integrated that into a patch. However, I've to
change keep_rinfo_list to be passed by pointer to
add_non_redundant_clauses(), because it might be changed in
distribute_restrictinfo_to_rels(). Without that there is a case of
duplicated clause in regression tests.
I've changed 'inner' and 'outer' vise versa in
remove_self_joins_one_group() for better readability (I believe that
was discussed upthread but lost). Also, I did a round of improvement
for comments and commit message.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v13-0001-Implement-Self-Join-Elimination.patchapplication/octet-stream; name=v13-0001-Implement-Self-Join-Elimination.patchDownload
From 59e8c2e9c277ef47efa98d82ba239c0e97618a11 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 13 Feb 2025 00:56:03 +0200
Subject: [PATCH v13] Implement Self-Join Elimination
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1241 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 9 +-
src/backend/rewrite/rewriteManip.c | 124 +-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1083 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 494 ++++++++
src/tools/pgindent/typedefs.list | 2 +
19 files changed, 2982 insertions(+), 148 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5e4f201e099..82c9e3d9731 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5544,6 +5544,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7cafaca33c5..0f9ecf5ee8b 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6e2051efc65..a43ca16d683 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4162,6 +4162,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b33fc671775..2b3f3ba4707 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,48 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. A sorting procedure is needed to simplify the search
+ * of SJE-candidate baserels referencing the same database relation. Having
+ * collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts
+ * to remove self-joins.
+ *
+ * Preliminary sorting prevents quadratic behavior that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +110,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +298,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +310,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
+ root->outer_join_rels = adjust_relid_set(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +358,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = adjust_relid_set(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = adjust_relid_set(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = adjust_relid_set(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = adjust_relid_set(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +405,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +418,112 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
+
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
+ phv->phrels = adjust_relid_set(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +586,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +601,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +680,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +698,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +713,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +920,15 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * (*extra_clauses) to be set to the right sides of baserestrictinfo clauses,
+ * looking like "x = const" if distinctness is derived from such clauses, not
+ * joininfo clauses. Pass NULL to the extra_clauses if this value is not
+ * needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +941,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1259,35 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to (*extra_clauses)
+ * additional clauses from a baserestrictinfo list used to prove the
+ * uniqueness.
+ *
+ * A non-NULL extra_clauses indicates that we're checking for self-join and
+ * correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1302,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non-self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels because we need extra clauses to be valid for our case. Also,
+ * for self-join checking we've filtered the clauses list. Thus, we can
+ * match only the result cached for a self-join search for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1340,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1354,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1409,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1440,896 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list and updated these clauses to reference the remaining
+ * relation, so that we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there might be join clauses tying the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler just to
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = adjust_relid_set(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = adjust_relid_set(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * This function adds all non-redundant clauses to the keeping relation
+ * during self-join elimination. That is a contradictory operation. On the
+ * one hand, we reduce the length of the `restrict` lists, which can
+ * impact planning or executing time. Additionally, we improve the
+ * accuracy of cardinality estimation. On the other hand, it is one more
+ * place that can make planning time much longer in specific cases. It
+ * would have been better to avoid calling the equal() function here, but
+ * it's the only way to detect duplicated inequality expressions.
+ *
+ * (*keep_rinfo_list) is given by pointer because it might be altered by
+ * distribute_restrictinfo_to_rels().
+ */
+static void
+add_non_redundant_clauses(PlannerInfo *root,
+ List *rinfo_candidates,
+ List **keep_rinfo_list,
+ Index removed_relid)
+{
+ foreach_node(RestrictInfo, rinfo, rinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(removed_relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, (*keep_rinfo_list))
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL &&
+ src->parent_ec == rinfo->parent_ec) ||
+ restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self-join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo, respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the
+ * self-join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ */
+ add_non_redundant_clauses(root, binfo_candidates,
+ &toKeep->baserestrictinfo, toRemove->relid);
+ add_non_redundant_clauses(root, jinfo_candidates,
+ &toKeep->joininfo, toRemove->relid);
+
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = adjust_relid_set(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ adjust_relid_set(root->all_result_relids, toRemove->relid, toKeep->relid);
+ adjust_relid_set(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path. */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses that aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self-joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate the join of two relations if they
+ * belong to different rules of order. Otherwise, the planner
+ * can't find any variants of the correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g., one is
+ * locked FOR UPDATE, and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self-joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self-join quals from the
+ * other quals. e.g., "x = x" goes to selfjoinquals and "a = b" to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * 'uclauses' is the copy of outer->baserestrictinfo that are
+ * associated with an index. We proved by matching selfjoinquals
+ * to a unique index that the outer relation has at most one
+ * matching row for each inner row. Sometimes that is not enough.
+ * e.g. "WHERE s1.b = s2.b AND s1.a = 1 AND s2.a = 2" when the
+ * unique index is (a,b). Having non-empty uclauses, we must
+ * validate that the inner baserestrictinfo contains the same
+ * expressions, or we won't match the same row on each side of the
+ * join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be
+ * removed, and these relations should not have TABLESAMPLE
+ * clauses specified. Removing a relation with TABLESAMPLE clause
+ * could potentially change the syntax of the query. Because of
+ * UPDATE/DELETE EPQ meachanism, currently Query->resultRelation
+ * or Query->mergeTargetRelation associated rel cannot be
+ * eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d56..5467e094ca7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7c27dc24e21..eab44da65b8 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,14 +639,17 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
- * be an RTE corresponding to each setop's output.
+ * be an RTE corresponding to each setop's output. Note, we use this not
+ * subquery's targetlist but subroot->parse's targetlist, because it was
+ * revised by self-join removal. subquery's targetlist might contain the
+ * references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index a115b217c91..bc8e6c89a6f 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -64,7 +64,6 @@ static bool locate_windowfunc_walker(Node *node,
locate_windowfunc_context *context);
static bool checkExprHasSubLink_walker(Node *node, void *context);
static Relids offset_relid_set(Relids relids, int offset);
-static Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
static Node *add_nulling_relids_mutator(Node *node,
add_nulling_relids_context *context);
static Node *remove_nulling_relids_mutator(Node *node,
@@ -544,6 +543,7 @@ offset_relid_set(Relids relids, int offset)
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
*
+ * change_RangeTblRef: do we change RangeTblRef or not. see ChangeVarNodesExtended.
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
* earlier to ensure that no unwanted side-effects occur!
@@ -554,6 +554,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -586,7 +587,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -633,6 +634,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * for SJE (self join elimination), changing varnos will make t1.a
+ * = t2.a to "t1.a = t1.a", which is always true, we can optizmied
+ * it out. to optimzied it out, we use equal to validate, but we
+ * also need to add a not null qual (NullTest).
+ *
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -665,32 +735,31 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * almost equivalent to ChangeVarNodes. also see comments in ChangeVarNodes.
+ * difference with ChangeVarNodes:
+ * when we use ChangeVarNodes for node we change all of it's underlying nodes.
+ * but for SJE we don't want to change node type: RangeTblRef, in some occasion.
+ * because SJE last step, remove_rel_from_joinlist is to
+ * remove left node (RangeTblRef) one by one. if in any case while using
+ * ChangeVarNodes, by accidently leaf node RangeTblRef also being updated
+ * then remove_rel_from_joinlist will have error.
+*/
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -701,7 +770,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -719,15 +787,22 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
- * Substitute newrelid for oldrelid in a Relid set
+ * adjust_relid_set - substitute newrelid for oldrelid in a Relid set
*
- * Note: some extensions may pass a special varno such as INDEX_VAR for
- * oldrelid. bms_is_member won't like that, but we should tolerate it.
- * (Perhaps newrelid could also be a special varno, but there had better
- * not be a reason to inject that into a nullingrels or phrels set.)
+ * Attempt to remove oldrelid from a Relid set (as long as it's not a special
+ * varno). If oldrelid was found and removed, insert newrelid into a Relid
+ * set (as long as it's not a special varno). Therefore, when oldrelid is
+ * a special varno, this function does nothing. When newrelid is a special
+ * varno, this function behaves as delete.
*/
-static Relids
+Relids
adjust_relid_set(Relids relids, int oldrelid, int newrelid)
{
if (!IS_SPECIAL_VARNO(oldrelid) && bms_is_member(oldrelid, relids))
@@ -736,7 +811,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
- relids = bms_add_member(relids, newrelid);
+ if (!IS_SPECIAL_VARNO(newrelid))
+ relids = bms_add_member(relids, newrelid);
}
return relids;
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 226af43fe23..2253366513e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 00c700cc3e7..fbf05322c75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -201,6 +201,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -753,7 +758,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -992,7 +997,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3463,4 +3468,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index bcf8ed645c2..78e05d88c8e 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f0..bc5dfd7db41 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe3..5a930199611 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index 512823033b9..ddd45099a94 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Bitmapset *adjust_relid_set(Relids relids, int oldId, int newId);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+extern void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 56227505009..ad8ab294ff6 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 3ffc066b1f8..a57bb18c24f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5966,6 +5966,27 @@ select c.id, ss.a from c
-> Seq Scan on c
(7 rows)
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on b
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on b b_1
+ -> Materialize
+ -> Seq Scan on a t1
+(7 rows)
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
-- test join removals on a partitioned table
@@ -6377,6 +6398,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd42..83228cfca29 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d01..7fc2159349b 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c7349eab933..c29d13b9fed 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2175,6 +2175,17 @@ select c.id, ss.a from c
left join (select d.a from onerow, d left join b on d.a = b.id) ss
on c.id = ss.a;
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
@@ -2409,6 +2420,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b6c170ac249..bce4214503d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2593,6 +2593,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4056,6 +4057,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5 (Apple Git-154)
On Thu, Feb 13, 2025 at 1:00 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:Hi! Thank you for the work with this subject, I think it is really
important.On 10.02.2025 22:58, Alexander Korotkov wrote:
Hi!
On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.Also, I did some grammar correction to your new comment in tests.
Thanks!
I've further revised adjust_relid_set() header comment.
Looking back to the work done since previous attempt to commit this to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yet the
current way still feels to be best possible.
3) For sure, several bugs were fixed.I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?I didn't find any mistakes, I just have a refactoring remark. I think
the part where we add non-redundant expressions with the
binfo_candidates, jinfo_candidates
check can be moved to a separate function, otherwise the code is very
repetitive in this place. I did it and attached diff fileThank you. I've integrated that into a patch. However, I've to
change keep_rinfo_list to be passed by pointer to
add_non_redundant_clauses(), because it might be changed in
distribute_restrictinfo_to_rels(). Without that there is a case of
duplicated clause in regression tests.I've changed 'inner' and 'outer' vise versa in
remove_self_joins_one_group() for better readability (I believe that
was discussed upthread but lost). Also, I did a round of improvement
for comments and commit message.
I've corrected some spelling error reported by Alexander Lakhin
privately to me. Also, I've revised comments around ChangeVarNodes()
and ChangeVarNodesExtended(). I'm going to continue nitpicking this
patch during next couple days then push it if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v14-0001-Implement-Self-Join-Elimination.patchapplication/octet-stream; name=v14-0001-Implement-Self-Join-Elimination.patchDownload
From 2facc72615b5ed323751ce91bdf096d971d0e77d Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Thu, 13 Feb 2025 00:56:03 +0200
Subject: [PATCH v14] Implement Self-Join Elimination
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result. Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures. Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses. Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.
This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.
SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;
In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.
SELECT * FROM t1 WHERE x IN
(SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);
Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.
The SJE proof is based on innerrel_is_unique() machinery.
We can remove a self-join when for each outer row:
1. At most, one inner row matches the join clause;
2. Each matched inner row must be (physically) the same as the outer one;
3. Inner and outer rows have the same row mark.
In this patch, we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x;
2. Add to the list above the baseretrictinfo of the inner table;
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables;
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
possibility of self-join elimination, the inner and outer clauses must
match exactly.
The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins. Tests covering this feature
were added to join.sql. Some of the existing regression tests changed due
to self-join removal logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
doc/src/sgml/config.sgml | 16 +
src/backend/optimizer/path/equivclass.c | 3 +-
src/backend/optimizer/path/indxpath.c | 39 +
src/backend/optimizer/plan/analyzejoins.c | 1240 +++++++++++++++++++--
src/backend/optimizer/plan/planmain.c | 5 +
src/backend/optimizer/prep/prepunion.c | 9 +-
src/backend/rewrite/rewriteManip.c | 126 ++-
src/backend/utils/misc/guc_tables.c | 10 +
src/include/nodes/pathnodes.h | 40 +-
src/include/optimizer/optimizer.h | 2 +
src/include/optimizer/paths.h | 3 +
src/include/optimizer/planmain.h | 6 +
src/include/rewrite/rewriteManip.h | 4 +
src/test/regress/expected/equivclass.out | 30 +
src/test/regress/expected/join.out | 1083 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/equivclass.sql | 16 +
src/test/regress/sql/join.sql | 494 ++++++++
src/tools/pgindent/typedefs.list | 2 +
19 files changed, 2983 insertions(+), 148 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 60829b79d83..336630ce417 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5544,6 +5544,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
+ <term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's optimization which analyses
+ the query tree and replaces self joins with semantically equivalent
+ single scans. Takes into consideration only plain tables.
+ The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 7cafaca33c5..0f9ecf5ee8b 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
- PVC_INCLUDE_PLACEHOLDERS);
+ PVC_INCLUDE_PLACEHOLDERS |
+ PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6e2051efc65..a43ca16d683 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4162,6 +4162,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
+{
+ return relation_has_unique_index_ext(root, rel, restrictlist,
+ exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * Same as relation_has_unique_index_for(), but supports extra_clauses
+ * parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
+ * which were used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List *oprlist,
+ List **extra_clauses)
{
ListCell *ic;
@@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+ List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+ if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ {
+ MemoryContext oldMemCtx =
+ MemoryContextSwitchTo(root->planner_cxt);
+
+ /*
+ * Add filter clause into a list allowing caller to
+ * know if uniqueness have made not only by join
+ * clauses.
+ */
+ Assert(bms_is_empty(rinfo->left_relids) ||
+ bms_is_empty(rinfo->right_relids));
+ if (extra_clauses)
+ exprs = lappend(exprs, rinfo);
+ MemoryContextSwitchTo(oldMemCtx);
+ }
+
break;
}
}
@@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
+ {
+ if (extra_clauses)
+ *extra_clauses = exprs;
return true;
+ }
}
return false;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index b33fc671775..3aa04d0d4e1 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
*/
#include "postgres.h"
+#include "catalog/pg_class.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/joininfo.h"
#include "optimizer/optimizer.h"
@@ -30,27 +31,48 @@
#include "optimizer/placeholder.h"
#include "optimizer/planmain.h"
#include "optimizer/restrictinfo.h"
+#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+/*
+ * Utility structure. A sorting procedure is needed to simplify the search
+ * of SJE-candidate baserels referencing the same database relation. Having
+ * collected all baserels from the query jointree, the planner sorts them
+ * according to the reloid value, groups them with the next pass and attempts
+ * to remove self-joins.
+ *
+ * Preliminary sorting prevents quadratic behavior that can be harmful in the
+ * case of numerous joins.
+ */
+typedef struct
+{
+ int relid;
+ Oid reloid;
+} SelfJoinCandidate;
+
+bool enable_self_join_elimination;
+
/* local functions */
static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
-static void remove_rel_from_query(PlannerInfo *root, int relid,
- SpecialJoinInfo *sjinfo);
+static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo);
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid);
+ int relid, int ojrelid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
- List *clause_list);
+ List *clause_list, List **extra_clauses);
static Oid distinct_col_search(int colno, List *colnos, List *opids);
static bool is_innerrel_unique_for(PlannerInfo *root,
Relids joinrelids,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist);
+ List *restrictlist,
+ List **extra_clauses);
+static int self_join_candidates_cmp(const void *a, const void *b);
/*
@@ -88,7 +110,7 @@ restart:
*/
innerrelid = bms_singleton_member(sjinfo->min_righthand);
- remove_rel_from_query(root, innerrelid, sjinfo);
+ remove_leftjoinrel_from_query(root, innerrelid, sjinfo);
/* We verify that exactly one reference gets removed from joinlist */
nremoved = 0;
@@ -276,7 +298,7 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
* Now that we have the relevant equality join clauses, try to prove the
* innerrel distinct.
*/
- if (rel_is_distinct_for(root, innerrel, clause_list))
+ if (rel_is_distinct_for(root, innerrel, clause_list, NULL))
return true;
/*
@@ -288,36 +310,31 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
/*
- * Remove the target relid and references to the target join from the
+ * Remove the target rel->relid and references to the target join from the
* planner's data structures, having determined that there is no need
- * to include them in the query.
+ * to include them in the query. Optionally replace them with subst if subst
+ * is non-negative.
*
- * We are not terribly thorough here. We only bother to update parts of
- * the planner's data structures that will actually be consulted later.
+ * This function updates only parts needed for both left-join removal and
+ * self-join removal.
*/
static void
-remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
+remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
+ int subst, SpecialJoinInfo *sjinfo,
+ Relids joinrelids)
{
- RelOptInfo *rel = find_base_rel(root, relid);
- int ojrelid = sjinfo->ojrelid;
- Relids joinrelids;
- Relids join_plus_commute;
- List *joininfos;
+ int relid = rel->relid;
+ int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
- /* Compute the relid set for the join we are considering */
- joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
- Assert(ojrelid != 0);
- joinrelids = bms_add_member(joinrelids, ojrelid);
-
/*
* Update all_baserels and related relid sets.
*/
- root->all_baserels = bms_del_member(root->all_baserels, relid);
- root->outer_join_rels = bms_del_member(root->outer_join_rels, ojrelid);
- root->all_query_rels = bms_del_member(root->all_query_rels, relid);
- root->all_query_rels = bms_del_member(root->all_query_rels, ojrelid);
+ root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
+ root->outer_join_rels = adjust_relid_set(root->outer_join_rels, ojrelid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
+ root->all_query_rels = adjust_relid_set(root->all_query_rels, ojrelid, subst);
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -341,20 +358,33 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
sjinf->min_righthand = bms_copy(sjinf->min_righthand);
sjinf->syn_lefthand = bms_copy(sjinf->syn_lefthand);
sjinf->syn_righthand = bms_copy(sjinf->syn_righthand);
- /* Now remove relid and ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, relid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, relid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, relid);
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
- /* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ /* Now remove relid from the sets: */
+ sjinf->min_lefthand = adjust_relid_set(sjinf->min_lefthand, relid, subst);
+ sjinf->min_righthand = adjust_relid_set(sjinf->min_righthand, relid, subst);
+ sjinf->syn_lefthand = adjust_relid_set(sjinf->syn_lefthand, relid, subst);
+ sjinf->syn_righthand = adjust_relid_set(sjinf->syn_righthand, relid, subst);
+
+ if (sjinfo != NULL)
+ {
+ Assert(subst <= 0 && ojrelid > 0);
+
+ /* Remove ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ /* relid cannot appear in these fields, but ojrelid can: */
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ }
+ else
+ {
+ Assert(subst > 0 && ojrelid == -1);
+
+ ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
+ }
}
/*
@@ -375,10 +405,10 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
- Assert(!bms_is_member(relid, phinfo->ph_lateral));
+ Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- !bms_is_member(ojrelid, phinfo->ph_eval_at))
+ (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -388,21 +418,112 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
{
PlaceHolderVar *phv = phinfo->ph_var;
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, relid);
- phinfo->ph_eval_at = bms_del_member(phinfo->ph_eval_at, ojrelid);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
phinfo->ph_needed = bms_make_singleton(0);
else
phinfo->ph_needed = NULL;
- phv->phrels = bms_del_member(phv->phrels, relid);
- phv->phrels = bms_del_member(phv->phrels, ojrelid);
+
+ phinfo->ph_lateral = adjust_relid_set(phinfo->ph_lateral, relid, subst);
+
+ /*
+ * ph_lateral might contain rels mentioned in ph_eval_at after the
+ * replacement, remove them.
+ */
+ phinfo->ph_lateral = bms_difference(phinfo->ph_lateral, phinfo->ph_eval_at);
+ /* ph_lateral might or might not be empty */
+
+ phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
+ phv->phrels = adjust_relid_set(phv->phrels, ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
+
+ ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
+
Assert(phv->phnullingrels == NULL); /* no need to adjust */
}
}
+ /*
+ * Likewise remove references from EquivalenceClasses.
+ */
+ foreach(l, root->eq_classes)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
+
+ if (bms_is_member(relid, ec->ec_relids) ||
+ (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ }
+
+ /*
+ * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
+ * ph_needed relid sets. These have to be known accurately, else we may
+ * fail to remove other now-removable outer joins. And our removal of the
+ * join clause(s) for this outer join may mean that Vars that were
+ * formerly needed no longer are. So we have to do this honestly by
+ * repeating the construction of those relid sets. We can cheat to one
+ * small extent: we can avoid re-examining the targetlist and HAVING qual
+ * by preserving "relation 0" bits from the existing relid sets. This is
+ * safe because we'd never remove such references.
+ *
+ * So, start by removing all other bits from attr_needed sets and
+ * lateral_vars lists. (We already did this above for ph_needed.)
+ */
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *otherrel = root->simple_rel_array[rti];
+ int attroff;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (otherrel == NULL)
+ continue;
+
+ Assert(otherrel->relid == rti); /* sanity check on array */
+
+ for (attroff = otherrel->max_attr - otherrel->min_attr;
+ attroff >= 0;
+ attroff--)
+ {
+ if (bms_is_member(0, otherrel->attr_needed[attroff]))
+ otherrel->attr_needed[attroff] = bms_make_singleton(0);
+ else
+ otherrel->attr_needed[attroff] = NULL;
+ }
+
+ if (subst > 0)
+ ChangeVarNodes((Node *) otherrel->lateral_vars, relid, subst, 0);
+ }
+}
+
+/*
+ * Remove the target relid and references to the target join from the
+ * planner's data structures, having determined that there is no need
+ * to include them in the query.
+ *
+ * We are not terribly thorough here. We only bother to update parts of
+ * the planner's data structures that will actually be consulted later.
+ */
+static void
+remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
+ SpecialJoinInfo *sjinfo)
+{
+ RelOptInfo *rel = find_base_rel(root, relid);
+ int ojrelid = sjinfo->ojrelid;
+ Relids joinrelids;
+ Relids join_plus_commute;
+ List *joininfos;
+ ListCell *l;
+
+ /* Compute the relid set for the join we are considering */
+ joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+ Assert(ojrelid != 0);
+ joinrelids = bms_add_member(joinrelids, ojrelid);
+
+ remove_rel_from_query(root, rel, -1, sjinfo, joinrelids);
+
/*
* Remove any joinquals referencing the rel from the joininfo lists.
*
@@ -465,18 +586,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
}
}
- /*
- * Likewise remove references from EquivalenceClasses.
- */
- foreach(l, root->eq_classes)
- {
- EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
-
- if (bms_is_member(relid, ec->ec_relids) ||
- bms_is_member(ojrelid, ec->ec_relids))
- remove_rel_from_eclass(ec, relid, ojrelid);
- }
-
/*
* There may be references to the rel in root->fkey_list, but if so,
* match_foreign_keys_to_quals() will get rid of them.
@@ -492,42 +601,6 @@ remove_rel_from_query(PlannerInfo *root, int relid, SpecialJoinInfo *sjinfo)
/* And nuke the RelOptInfo, just in case there's another access path */
pfree(rel);
- /*
- * Finally, we must recompute per-Var attr_needed and per-PlaceHolderVar
- * ph_needed relid sets. These have to be known accurately, else we may
- * fail to remove other now-removable outer joins. And our removal of the
- * join clause(s) for this outer join may mean that Vars that were
- * formerly needed no longer are. So we have to do this honestly by
- * repeating the construction of those relid sets. We can cheat to one
- * small extent: we can avoid re-examining the targetlist and HAVING qual
- * by preserving "relation 0" bits from the existing relid sets. This is
- * safe because we'd never remove such references.
- *
- * So, start by removing all other bits from attr_needed sets. (We
- * already did this above for ph_needed.)
- */
- for (rti = 1; rti < root->simple_rel_array_size; rti++)
- {
- RelOptInfo *otherrel = root->simple_rel_array[rti];
- int attroff;
-
- /* there may be empty slots corresponding to non-baserel RTEs */
- if (otherrel == NULL)
- continue;
-
- Assert(otherrel->relid == rti); /* sanity check on array */
-
- for (attroff = otherrel->max_attr - otherrel->min_attr;
- attroff >= 0;
- attroff--)
- {
- if (bms_is_member(0, otherrel->attr_needed[attroff]))
- otherrel->attr_needed[attroff] = bms_make_singleton(0);
- else
- otherrel->attr_needed[attroff] = NULL;
- }
- }
-
/*
* Now repeat construction of attr_needed bits coming from all other
* sources.
@@ -607,13 +680,13 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
+remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
- ec->ec_relids = bms_del_member(ec->ec_relids, relid);
- ec->ec_relids = bms_del_member(ec->ec_relids, ojrelid);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -625,11 +698,11 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- bms_is_member(ojrelid, cur_em->em_relids))
+ (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, relid);
- cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids, ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -640,7 +713,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ if (ojrelid == -1)
+ ChangeVarNodes((Node *) rinfo, relid, subst, 0);
+ else
+ remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
}
/*
@@ -844,9 +920,15 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
* Note that the passed-in clause_list may be destructively modified! This
* is OK for current uses, because the clause_list is built by the caller for
* the sole purpose of passing to this function.
+ *
+ * (*extra_clauses) to be set to the right sides of baserestrictinfo clauses,
+ * looking like "x = const" if distinctness is derived from such clauses, not
+ * joininfo clauses. Pass NULL to the extra_clauses if this value is not
+ * needed.
*/
static bool
-rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
+rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list,
+ List **extra_clauses)
{
/*
* We could skip a couple of tests here if we assume all callers checked
@@ -859,10 +941,11 @@ rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
{
/*
* Examine the indexes to see if we have a matching unique index.
- * relation_has_unique_index_for automatically adds any usable
+ * relation_has_unique_index_ext automatically adds any usable
* restriction clauses for the rel, so we needn't do that here.
*/
- if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
+ if (relation_has_unique_index_ext(root, rel, clause_list, NIL, NIL,
+ extra_clauses))
return true;
}
else if (rel->rtekind == RTE_SUBQUERY)
@@ -1176,9 +1259,35 @@ innerrel_is_unique(PlannerInfo *root,
JoinType jointype,
List *restrictlist,
bool force_cache)
+{
+ return innerrel_is_unique_ext(root, joinrelids, outerrelids, innerrel,
+ jointype, restrictlist, force_cache, NULL);
+}
+
+/*
+ * innerrel_is_unique_ext
+ * Do the same as innerrel_is_unique(), but also set to (*extra_clauses)
+ * additional clauses from a baserestrictinfo list used to prove the
+ * uniqueness.
+ *
+ * A non-NULL extra_clauses indicates that we're checking for self-join and
+ * correspondingly dealing with filtered clauses.
+ */
+bool
+innerrel_is_unique_ext(PlannerInfo *root,
+ Relids joinrelids,
+ Relids outerrelids,
+ RelOptInfo *innerrel,
+ JoinType jointype,
+ List *restrictlist,
+ bool force_cache,
+ List **extra_clauses)
{
MemoryContext old_context;
ListCell *lc;
+ UniqueRelInfo *uniqueRelInfo;
+ List *outer_exprs = NIL;
+ bool self_join = (extra_clauses != NULL);
/* Certainly can't prove uniqueness when there are no joinclauses */
if (restrictlist == NIL)
@@ -1193,17 +1302,28 @@ innerrel_is_unique(PlannerInfo *root,
/*
* Query the cache to see if we've managed to prove that innerrel is
- * unique for any subset of this outerrel. We don't need an exact match,
- * as extra outerrels can't make the innerrel any less unique (or more
- * formally, the restrictlist for a join to a superset outerrel must be a
- * superset of the conditions we successfully used before).
+ * unique for any subset of this outerrel. For non-self-join search, we
+ * don't need an exact match, as extra outerrels can't make the innerrel
+ * any less unique (or more formally, the restrictlist for a join to a
+ * superset outerrel must be a superset of the conditions we successfully
+ * used before). For self-join search, we require an exact match of
+ * outerrels because we need extra clauses to be valid for our case. Also,
+ * for self-join checking we've filtered the clauses list. Thus, we can
+ * match only the result cached for a self-join search for another
+ * self-join check.
*/
foreach(lc, innerrel->unique_for_rels)
{
- Relids unique_for_rels = (Relids) lfirst(lc);
+ uniqueRelInfo = (UniqueRelInfo *) lfirst(lc);
- if (bms_is_subset(unique_for_rels, outerrelids))
+ if ((!self_join && bms_is_subset(uniqueRelInfo->outerrelids, outerrelids)) ||
+ (self_join && bms_equal(uniqueRelInfo->outerrelids, outerrelids) &&
+ uniqueRelInfo->self_join))
+ {
+ if (extra_clauses)
+ *extra_clauses = uniqueRelInfo->extra_clauses;
return true; /* Success! */
+ }
}
/*
@@ -1220,7 +1340,8 @@ innerrel_is_unique(PlannerInfo *root,
/* No cached information, so try to make the proof. */
if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
- jointype, restrictlist))
+ jointype, restrictlist,
+ self_join ? &outer_exprs : NULL))
{
/*
* Cache the positive result for future probes, being sure to keep it
@@ -1233,10 +1354,16 @@ innerrel_is_unique(PlannerInfo *root,
* supersets of them anyway.
*/
old_context = MemoryContextSwitchTo(root->planner_cxt);
+ uniqueRelInfo = makeNode(UniqueRelInfo);
+ uniqueRelInfo->outerrelids = bms_copy(outerrelids);
+ uniqueRelInfo->self_join = self_join;
+ uniqueRelInfo->extra_clauses = outer_exprs;
innerrel->unique_for_rels = lappend(innerrel->unique_for_rels,
- bms_copy(outerrelids));
+ uniqueRelInfo);
MemoryContextSwitchTo(old_context);
+ if (extra_clauses)
+ *extra_clauses = outer_exprs;
return true; /* Success! */
}
else
@@ -1282,7 +1409,8 @@ is_innerrel_unique_for(PlannerInfo *root,
Relids outerrelids,
RelOptInfo *innerrel,
JoinType jointype,
- List *restrictlist)
+ List *restrictlist,
+ List **extra_clauses)
{
List *clause_list = NIL;
ListCell *lc;
@@ -1312,17 +1440,895 @@ is_innerrel_unique_for(PlannerInfo *root,
continue; /* not mergejoinable */
/*
- * Check if clause has the form "outer op inner" or "inner op outer",
- * and if so mark which side is inner.
+ * Check if the clause has the form "outer op inner" or "inner op
+ * outer", and if so mark which side is inner.
*/
if (!clause_sides_match_join(restrictinfo, outerrelids,
innerrel->relids))
continue; /* no good for these input relations */
- /* OK, add to list */
+ /* OK, add to the list */
clause_list = lappend(clause_list, restrictinfo);
}
/* Let rel_is_distinct_for() do the hard work */
- return rel_is_distinct_for(root, innerrel, clause_list);
+ return rel_is_distinct_for(root, innerrel, clause_list, extra_clauses);
+}
+
+/*
+ * Update EC members to point to the remaining relation instead of the removed
+ * one, removing duplicates.
+ *
+ * Restriction clauses for base relations are already distributed to
+ * the respective baserestrictinfo lists (see
+ * generate_implied_equalities_for_column). The above code has already processed
+ * this list and updated these clauses to reference the remaining
+ * relation, so that we can skip them here based on their relids.
+ *
+ * Likewise, we have already processed the join clauses that join the
+ * removed relation to the remaining one.
+ *
+ * Finally, there might be join clauses tying the removed relation to
+ * some third relation. We can't just delete the source clauses and
+ * regenerate them from the EC because the corresponding equality
+ * operators might be missing (see the handling of ec_broken).
+ * Therefore, we will update the references in the source clauses.
+ *
+ * Derived clauses can be generated again, so it is simpler just to
+ * delete them.
+ */
+static void
+update_eclasses(EquivalenceClass *ec, int from, int to)
+{
+ List *new_members = NIL;
+ List *new_sources = NIL;
+
+ foreach_node(EquivalenceMember, em, ec->ec_members)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, em->em_relids))
+ {
+ new_members = lappend(new_members, em);
+ continue;
+ }
+
+ em->em_relids = adjust_relid_set(em->em_relids, from, to);
+ em->em_jdomain->jd_relids = adjust_relid_set(em->em_jdomain->jd_relids, from, to);
+
+ /* We only process inner joins */
+ ChangeVarNodes((Node *) em->em_expr, from, to, 0);
+
+ foreach_node(EquivalenceMember, other, new_members)
+ {
+ if (!equal(em->em_relids, other->em_relids))
+ continue;
+
+ if (equal(em->em_expr, other->em_expr))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_members = lappend(new_members, em);
+ }
+
+ list_free(ec->ec_members);
+ ec->ec_members = new_members;
+
+ list_free(ec->ec_derives);
+ ec->ec_derives = NULL;
+
+ /* Update EC source expressions */
+ foreach_node(RestrictInfo, rinfo, ec->ec_sources)
+ {
+ bool is_redundant = false;
+
+ if (!bms_is_member(from, rinfo->required_relids))
+ {
+ new_sources = lappend(new_sources, rinfo);
+ continue;
+ }
+
+ ChangeVarNodes((Node *) rinfo, from, to, 0);
+
+ /*
+ * After switching the clause to the remaining relation, check it for
+ * redundancy with existing ones. We don't have to check for
+ * redundancy with derived clauses, because we've just deleted them.
+ */
+ foreach_node(RestrictInfo, other, new_sources)
+ {
+ if (!equal(rinfo->clause_relids, other->clause_relids))
+ continue;
+
+ if (equal(rinfo->clause, other->clause))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+
+ if (!is_redundant)
+ new_sources = lappend(new_sources, rinfo);
+ }
+
+ list_free(ec->ec_sources);
+ ec->ec_sources = new_sources;
+ ec->ec_relids = adjust_relid_set(ec->ec_relids, from, to);
+}
+
+/*
+ * "Logically" compares two RestrictInfo's ignoring the 'rinfo_serial' field,
+ * which makes almost every RestrictInfo unique. This type of comparison is
+ * useful when removing duplicates while moving RestrictInfo's from removed
+ * relation to remaining relation during self-join elimination.
+ *
+ * XXX: In the future, we might remove the 'rinfo_serial' field completely and
+ * get rid of this function.
+ */
+static bool
+restrict_infos_logically_equal(RestrictInfo *a, RestrictInfo *b)
+{
+ int saved_rinfo_serial = a->rinfo_serial;
+ bool result;
+
+ a->rinfo_serial = b->rinfo_serial;
+ result = equal(a, b);
+ a->rinfo_serial = saved_rinfo_serial;
+
+ return result;
+}
+
+/*
+ * This function adds all non-redundant clauses to the keeping relation
+ * during self-join elimination. That is a contradictory operation. On the
+ * one hand, we reduce the length of the `restrict` lists, which can
+ * impact planning or executing time. Additionally, we improve the
+ * accuracy of cardinality estimation. On the other hand, it is one more
+ * place that can make planning time much longer in specific cases. It
+ * would have been better to avoid calling the equal() function here, but
+ * it's the only way to detect duplicated inequality expressions.
+ *
+ * (*keep_rinfo_list) is given by pointer because it might be altered by
+ * distribute_restrictinfo_to_rels().
+ */
+static void
+add_non_redundant_clauses(PlannerInfo *root,
+ List *rinfo_candidates,
+ List **keep_rinfo_list,
+ Index removed_relid)
+{
+ foreach_node(RestrictInfo, rinfo, rinfo_candidates)
+ {
+ bool is_redundant = false;
+
+ Assert(!bms_is_member(removed_relid, rinfo->required_relids));
+
+ foreach_node(RestrictInfo, src, (*keep_rinfo_list))
+ {
+ if (!bms_equal(src->clause_relids, rinfo->clause_relids))
+ /* Can't compare trivially different clauses */
+ continue;
+
+ if (src == rinfo ||
+ (rinfo->parent_ec != NULL &&
+ src->parent_ec == rinfo->parent_ec) ||
+ restrict_infos_logically_equal(rinfo, src))
+ {
+ is_redundant = true;
+ break;
+ }
+ }
+ if (!is_redundant)
+ distribute_restrictinfo_to_rels(root, rinfo);
+ }
+}
+
+/*
+ * Remove a relation after we have proven that it participates only in an
+ * unneeded unique self-join.
+ *
+ * Replace any links in planner info structures.
+ *
+ * Transfer join and restriction clauses from the removed relation to the
+ * remaining one. We change the Vars of the clause to point to the
+ * remaining relation instead of the removed one. The clauses that require
+ * a subset of joinrelids become restriction clauses of the remaining
+ * relation, and others remain join clauses. We append them to
+ * baserestrictinfo and joininfo, respectively, trying not to introduce
+ * duplicates.
+ *
+ * We also have to process the 'joinclauses' list here, because it
+ * contains EC-derived join clauses which must become filter clauses. It
+ * is not enough to just correct the ECs because the EC-derived
+ * restrictions are generated before join removal (see
+ * generate_base_implied_equalities).
+ *
+ * NOTE: Remember to keep the code in sync with PlannerInfo to be sure all
+ * cached relids and relid bitmapsets can be correctly cleaned during the
+ * self-join elimination procedure.
+ */
+static void
+remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
+ RelOptInfo *toKeep, RelOptInfo *toRemove,
+ List *restrictlist)
+{
+ List *joininfos;
+ ListCell *lc;
+ int i;
+ List *jinfo_candidates = NIL;
+ List *binfo_candidates = NIL;
+
+ Assert(toKeep->relid > 0);
+ Assert(toRemove->relid > 0);
+
+ /*
+ * Replace the index of the removing table with the keeping one. The
+ * technique of removing/distributing restrictinfo is used here to attach
+ * just appeared (for keeping relation) join clauses and avoid adding
+ * duplicates of those that already exist in the joininfo list.
+ */
+ joininfos = list_copy(toRemove->joininfo);
+ foreach_node(RestrictInfo, rinfo, joininfos)
+ {
+ remove_join_clause_from_rels(root, rinfo, rinfo->required_relids);
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Concatenate restrictlist to the list of base restrictions of the
+ * removing table just to simplify the replacement procedure: all of them
+ * weren't connected to any keeping relations and need to be added to some
+ * rels.
+ */
+ toRemove->baserestrictinfo = list_concat(toRemove->baserestrictinfo,
+ restrictlist);
+ foreach_node(RestrictInfo, rinfo, toRemove->baserestrictinfo)
+ {
+ ChangeVarNodes((Node *) rinfo, toRemove->relid, toKeep->relid, 0);
+
+ if (bms_membership(rinfo->required_relids) == BMS_MULTIPLE)
+ jinfo_candidates = lappend(jinfo_candidates, rinfo);
+ else
+ binfo_candidates = lappend(binfo_candidates, rinfo);
+ }
+
+ /*
+ * Now, add all non-redundant clauses to the keeping relation.
+ */
+ add_non_redundant_clauses(root, binfo_candidates,
+ &toKeep->baserestrictinfo, toRemove->relid);
+ add_non_redundant_clauses(root, jinfo_candidates,
+ &toKeep->joininfo, toRemove->relid);
+
+ list_free(binfo_candidates);
+ list_free(jinfo_candidates);
+
+ /*
+ * Arrange equivalence classes, mentioned removing a table, with the
+ * keeping one: varno of removing table should be replaced in members and
+ * sources lists. Also, remove duplicated elements if this replacement
+ * procedure created them.
+ */
+ i = -1;
+ while ((i = bms_next_member(toRemove->eclass_indexes, i)) >= 0)
+ {
+ EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+
+ update_eclasses(ec, toRemove->relid, toKeep->relid);
+ toKeep->eclass_indexes = bms_add_member(toKeep->eclass_indexes, i);
+ }
+
+ /*
+ * Transfer the targetlist and attr_needed flags.
+ */
+
+ foreach(lc, toRemove->reltarget->exprs)
+ {
+ Node *node = lfirst(lc);
+
+ ChangeVarNodes(node, toRemove->relid, toKeep->relid, 0);
+ if (!list_member(toKeep->reltarget->exprs, node))
+ toKeep->reltarget->exprs = lappend(toKeep->reltarget->exprs, node);
+ }
+
+ for (i = toKeep->min_attr; i <= toKeep->max_attr; i++)
+ {
+ int attno = i - toKeep->min_attr;
+
+ toRemove->attr_needed[attno] = adjust_relid_set(toRemove->attr_needed[attno],
+ toRemove->relid, toKeep->relid);
+ toKeep->attr_needed[attno] = bms_add_members(toKeep->attr_needed[attno],
+ toRemove->attr_needed[attno]);
+ }
+
+ /*
+ * If the removed relation has a row mark, transfer it to the remaining
+ * one.
+ *
+ * If both rels have row marks, just keep the one corresponding to the
+ * remaining relation because we verified earlier that they have the same
+ * strength.
+ */
+ if (rmark)
+ {
+ if (kmark)
+ {
+ Assert(kmark->markType == rmark->markType);
+
+ root->rowMarks = list_delete_ptr(root->rowMarks, rmark);
+ }
+ else
+ {
+ /* Shouldn't have inheritance children here. */
+ Assert(rmark->rti == rmark->prti);
+
+ rmark->rti = rmark->prti = toKeep->relid;
+ }
+ }
+
+ /*
+ * Replace varno in all the query structures, except nodes RangeTblRef
+ * otherwise later remove_rel_from_joinlist will yield errors.
+ */
+ ChangeVarNodesExtended((Node *) root->parse, toRemove->relid, toKeep->relid, 0, false);
+
+ /* Replace links in the planner info */
+ remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
+
+ /* At last, replace varno in root targetlist and HAVING clause */
+ ChangeVarNodes((Node *) root->processed_tlist, toRemove->relid, toKeep->relid, 0);
+ ChangeVarNodes((Node *) root->processed_groupClause, toRemove->relid, toKeep->relid, 0);
+
+ adjust_relid_set(root->all_result_relids, toRemove->relid, toKeep->relid);
+ adjust_relid_set(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
+ /*
+ * There may be references to the rel in root->fkey_list, but if so,
+ * match_foreign_keys_to_quals() will get rid of them.
+ */
+
+ /*
+ * Finally, remove the rel from the baserel array to prevent it from being
+ * referenced again. (We can't do this earlier because
+ * remove_join_clause_from_rels will touch it.)
+ */
+ root->simple_rel_array[toRemove->relid] = NULL;
+
+ /* And nuke the RelOptInfo, just in case there's another access path. */
+ pfree(toRemove);
+
+ /*
+ * Now repeat construction of attr_needed bits coming from all other
+ * sources.
+ */
+ rebuild_placeholder_attr_needed(root);
+ rebuild_joinclause_attr_needed(root);
+ rebuild_eclass_attr_needed(root);
+ rebuild_lateral_attr_needed(root);
+}
+
+/*
+ * split_selfjoin_quals
+ * Processes 'joinquals' by building two lists: one containing the quals
+ * where the columns/exprs are on either side of the join match and
+ * another one containing the remaining quals.
+ *
+ * 'joinquals' must only contain quals for a RTE_RELATION being joined to
+ * itself.
+ */
+static void
+split_selfjoin_quals(PlannerInfo *root, List *joinquals, List **selfjoinquals,
+ List **otherjoinquals, int from, int to)
+{
+ List *sjoinquals = NIL;
+ List *ojoinquals = NIL;
+
+ foreach_node(RestrictInfo, rinfo, joinquals)
+ {
+ OpExpr *expr;
+ Node *leftexpr;
+ Node *rightexpr;
+
+ /* In general, clause looks like F(arg1) = G(arg2) */
+ if (!rinfo->mergeopfamilies ||
+ bms_num_members(rinfo->clause_relids) != 2 ||
+ bms_membership(rinfo->left_relids) != BMS_SINGLETON ||
+ bms_membership(rinfo->right_relids) != BMS_SINGLETON)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ expr = (OpExpr *) rinfo->clause;
+
+ if (!IsA(expr, OpExpr) || list_length(expr->args) != 2)
+ {
+ ojoinquals = lappend(ojoinquals, rinfo);
+ continue;
+ }
+
+ leftexpr = get_leftop(rinfo->clause);
+ rightexpr = copyObject(get_rightop(rinfo->clause));
+
+ if (leftexpr && IsA(leftexpr, RelabelType))
+ leftexpr = (Node *) ((RelabelType *) leftexpr)->arg;
+ if (rightexpr && IsA(rightexpr, RelabelType))
+ rightexpr = (Node *) ((RelabelType *) rightexpr)->arg;
+
+ /*
+ * Quite an expensive operation, narrowing the use case. For example,
+ * when we have cast of the same var to different (but compatible)
+ * types.
+ */
+ ChangeVarNodes(rightexpr, bms_singleton_member(rinfo->right_relids),
+ bms_singleton_member(rinfo->left_relids), 0);
+
+ if (equal(leftexpr, rightexpr))
+ sjoinquals = lappend(sjoinquals, rinfo);
+ else
+ ojoinquals = lappend(ojoinquals, rinfo);
+ }
+
+ *selfjoinquals = sjoinquals;
+ *otherjoinquals = ojoinquals;
+}
+
+/*
+ * Check for a case when uniqueness is at least partly derived from a
+ * baserestrictinfo clause. In this case, we have a chance to return only
+ * one row (if such clauses on both sides of SJ are equal) or nothing (if they
+ * are different).
+ */
+static bool
+match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
+ Index relid)
+{
+ foreach_node(RestrictInfo, rinfo, uclauses)
+ {
+ Expr *clause;
+ Node *iclause;
+ Node *c1;
+ bool matched = false;
+
+ Assert(outer->relid > 0 && relid > 0);
+
+ /* Only filters like f(R.x1,...,R.xN) == expr we should consider. */
+ Assert(bms_is_empty(rinfo->left_relids) ^
+ bms_is_empty(rinfo->right_relids));
+
+ clause = (Expr *) copyObject(rinfo->clause);
+ ChangeVarNodes((Node *) clause, relid, outer->relid, 0);
+
+ iclause = bms_is_empty(rinfo->left_relids) ? get_rightop(clause) :
+ get_leftop(clause);
+ c1 = bms_is_empty(rinfo->left_relids) ? get_leftop(clause) :
+ get_rightop(clause);
+
+ /*
+ * Compare these left and right sides with the corresponding sides of
+ * the outer's filters. If no one is detected - return immediately.
+ */
+ foreach_node(RestrictInfo, orinfo, outer->baserestrictinfo)
+ {
+ Node *oclause;
+ Node *c2;
+
+ if (orinfo->mergeopfamilies == NIL)
+ /* Don't consider clauses that aren't similar to 'F(X)=G(Y)' */
+ continue;
+
+ Assert(is_opclause(orinfo->clause));
+
+ oclause = bms_is_empty(orinfo->left_relids) ?
+ get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
+ c2 = (bms_is_empty(orinfo->left_relids) ?
+ get_leftop(orinfo->clause) : get_rightop(orinfo->clause));
+
+ if (equal(iclause, oclause) && equal(c1, c2))
+ {
+ matched = true;
+ break;
+ }
+ }
+
+ if (!matched)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Find and remove unique self-joins in a group of base relations that have
+ * the same Oid.
+ *
+ * Returns a set of relids that were removed.
+ */
+static Relids
+remove_self_joins_one_group(PlannerInfo *root, Relids relids)
+{
+ Relids result = NULL;
+ int k; /* Index of kept relation */
+ int r = -1; /* Index of removed relation */
+
+ while ((r = bms_next_member(relids, r)) > 0)
+ {
+ RelOptInfo *inner = root->simple_rel_array[r];
+
+ k = r;
+
+ while ((k = bms_next_member(relids, k)) > 0)
+ {
+ Relids joinrelids = NULL;
+ RelOptInfo *outer = root->simple_rel_array[k];
+ List *restrictlist;
+ List *selfjoinquals;
+ List *otherjoinquals;
+ ListCell *lc;
+ bool jinfo_check = true;
+ PlanRowMark *omark = NULL;
+ PlanRowMark *imark = NULL;
+ List *uclauses = NIL;
+
+ /* A sanity check: the relations have the same Oid. */
+ Assert(root->simple_rte_array[k]->relid ==
+ root->simple_rte_array[r]->relid);
+
+ /*
+ * It is impossible to eliminate the join of two relations if they
+ * belong to different rules of order. Otherwise, the planner
+ * can't find any variants of the correct query plan.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *info = (SpecialJoinInfo *) lfirst(lc);
+
+ if ((bms_is_member(k, info->syn_lefthand) ^
+ bms_is_member(r, info->syn_lefthand)) ||
+ (bms_is_member(k, info->syn_righthand) ^
+ bms_is_member(r, info->syn_righthand)))
+ {
+ jinfo_check = false;
+ break;
+ }
+ }
+ if (!jinfo_check)
+ continue;
+
+ /*
+ * Check Row Marks equivalence. We can't remove the join if the
+ * relations have row marks of different strength (e.g., one is
+ * locked FOR UPDATE, and another just has ROW_MARK_REFERENCE for
+ * EvalPlanQual rechecking).
+ */
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rowMark = (PlanRowMark *) lfirst(lc);
+
+ if (rowMark->rti == k)
+ {
+ Assert(imark == NULL);
+ imark = rowMark;
+ }
+ else if (rowMark->rti == r)
+ {
+ Assert(omark == NULL);
+ omark = rowMark;
+ }
+
+ if (omark && imark)
+ break;
+ }
+ if (omark && imark && omark->markType != imark->markType)
+ continue;
+
+ /*
+ * We only deal with base rels here, so their relids bitset
+ * contains only one member -- their relid.
+ */
+ joinrelids = bms_add_member(joinrelids, r);
+ joinrelids = bms_add_member(joinrelids, k);
+
+ /*
+ * PHVs should not impose any constraints on removing self-joins.
+ */
+
+ /*
+ * At this stage, joininfo lists of inner and outer can contain
+ * only clauses required for a superior outer join that can't
+ * influence this optimization. So, we can avoid to call the
+ * build_joinrel_restrictlist() routine.
+ */
+ restrictlist = generate_join_implied_equalities(root, joinrelids,
+ inner->relids,
+ outer, NULL);
+ if (restrictlist == NIL)
+ continue;
+
+ /*
+ * Process restrictlist to separate the self-join quals from the
+ * other quals. e.g., "x = x" goes to selfjoinquals and "a = b" to
+ * otherjoinquals.
+ */
+ split_selfjoin_quals(root, restrictlist, &selfjoinquals,
+ &otherjoinquals, inner->relid, outer->relid);
+
+ Assert(list_length(restrictlist) ==
+ (list_length(selfjoinquals) + list_length(otherjoinquals)));
+
+ /*
+ * To enable SJE for the only degenerate case without any self
+ * join clauses at all, add baserestrictinfo to this list. The
+ * degenerate case works only if both sides have the same clause.
+ * So doesn't matter which side to add.
+ */
+ selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);
+
+ /*
+ * Determine if the inner table can duplicate outer rows. We must
+ * bypass the unique rel cache here since we're possibly using a
+ * subset of join quals. We can use 'force_cache' == true when all
+ * join quals are self-join quals. Otherwise, we could end up
+ * putting false negatives in the cache.
+ */
+ if (!innerrel_is_unique_ext(root, joinrelids, inner->relids,
+ outer, JOIN_INNER, selfjoinquals,
+ list_length(otherjoinquals) == 0,
+ &uclauses))
+ continue;
+
+ /*
+ * 'uclauses' is the copy of outer->baserestrictinfo that are
+ * associated with an index. We proved by matching selfjoinquals
+ * to a unique index that the outer relation has at most one
+ * matching row for each inner row. Sometimes that is not enough.
+ * e.g. "WHERE s1.b = s2.b AND s1.a = 1 AND s2.a = 2" when the
+ * unique index is (a,b). Having non-empty uclauses, we must
+ * validate that the inner baserestrictinfo contains the same
+ * expressions, or we won't match the same row on each side of the
+ * join.
+ */
+ if (!match_unique_clauses(root, inner, uclauses, outer->relid))
+ continue;
+
+ /*
+ * We can remove either relation, so remove the inner one in order
+ * to simplify this loop.
+ */
+ remove_self_join_rel(root, omark, imark, outer, inner, restrictlist);
+
+ result = bms_add_member(result, r);
+
+ /* We have removed the outer relation, try the next one. */
+ break;
+ }
+ }
+
+ return result;
+}
+
+/*
+ * Gather indexes of base relations from the joinlist and try to eliminate self
+ * joins.
+ */
+static Relids
+remove_self_joins_recurse(PlannerInfo *root, List *joinlist, Relids toRemove)
+{
+ ListCell *jl;
+ Relids relids = NULL;
+ SelfJoinCandidate *candidates = NULL;
+ int i;
+ int j;
+ int numRels;
+
+ /* Collect indexes of base relations of the join tree */
+ foreach(jl, joinlist)
+ {
+ Node *jlnode = (Node *) lfirst(jl);
+
+ if (IsA(jlnode, RangeTblRef))
+ {
+ int varno = ((RangeTblRef *) jlnode)->rtindex;
+ RangeTblEntry *rte = root->simple_rte_array[varno];
+
+ /*
+ * We only consider ordinary relations as candidates to be
+ * removed, and these relations should not have TABLESAMPLE
+ * clauses specified. Removing a relation with TABLESAMPLE clause
+ * could potentially change the syntax of the query. Because of
+ * UPDATE/DELETE EPQ mechanism, currently Query->resultRelation or
+ * Query->mergeTargetRelation associated rel cannot be eliminated.
+ */
+ if (rte->rtekind == RTE_RELATION &&
+ rte->relkind == RELKIND_RELATION &&
+ rte->tablesample == NULL &&
+ varno != root->parse->resultRelation &&
+ varno != root->parse->mergeTargetRelation)
+ {
+ Assert(!bms_is_member(varno, relids));
+ relids = bms_add_member(relids, varno);
+ }
+ }
+ else if (IsA(jlnode, List))
+ {
+ /* Recursively go inside the sub-joinlist */
+ toRemove = remove_self_joins_recurse(root, (List *) jlnode,
+ toRemove);
+ }
+ else
+ elog(ERROR, "unrecognized joinlist node type: %d",
+ (int) nodeTag(jlnode));
+ }
+
+ numRels = bms_num_members(relids);
+
+ /* Need at least two relations for the join */
+ if (numRels < 2)
+ return toRemove;
+
+ /*
+ * In order to find relations with the same oid we first build an array of
+ * candidates and then sort it by oid.
+ */
+ candidates = (SelfJoinCandidate *) palloc(sizeof(SelfJoinCandidate) *
+ numRels);
+ i = -1;
+ j = 0;
+ while ((i = bms_next_member(relids, i)) >= 0)
+ {
+ candidates[j].relid = i;
+ candidates[j].reloid = root->simple_rte_array[i]->relid;
+ j++;
+ }
+
+ qsort(candidates, numRels, sizeof(SelfJoinCandidate),
+ self_join_candidates_cmp);
+
+ /*
+ * Iteratively form a group of relation indexes with the same oid and
+ * launch the routine that detects self-joins in this group and removes
+ * excessive range table entries.
+ *
+ * At the end of the iteration, exclude the group from the overall relids
+ * list. So each next iteration of the cycle will involve less and less
+ * value of relids.
+ */
+ i = 0;
+ for (j = 1; j < numRels + 1; j++)
+ {
+ if (j == numRels || candidates[j].reloid != candidates[i].reloid)
+ {
+ if (j - i >= 2)
+ {
+ /* Create a group of relation indexes with the same oid */
+ Relids group = NULL;
+ Relids removed;
+
+ while (i < j)
+ {
+ group = bms_add_member(group, candidates[i].relid);
+ i++;
+ }
+ relids = bms_del_members(relids, group);
+
+ /*
+ * Try to remove self-joins from a group of identical entries.
+ * Make the next attempt iteratively - if something is deleted
+ * from a group, changes in clauses and equivalence classes
+ * can give us a chance to find more candidates.
+ */
+ do
+ {
+ Assert(!bms_overlap(group, toRemove));
+ removed = remove_self_joins_one_group(root, group);
+ toRemove = bms_add_members(toRemove, removed);
+ group = bms_del_members(group, removed);
+ } while (!bms_is_empty(removed) &&
+ bms_membership(group) == BMS_MULTIPLE);
+ bms_free(removed);
+ bms_free(group);
+ }
+ else
+ {
+ /* Single relation, just remove it from the set */
+ relids = bms_del_member(relids, candidates[i].relid);
+ i = j;
+ }
+ }
+ }
+
+ Assert(bms_is_empty(relids));
+
+ return toRemove;
+}
+
+/*
+ * Compare self-join candidates by their oids.
+ */
+static int
+self_join_candidates_cmp(const void *a, const void *b)
+{
+ const SelfJoinCandidate *ca = (const SelfJoinCandidate *) a;
+ const SelfJoinCandidate *cb = (const SelfJoinCandidate *) b;
+
+ if (ca->reloid != cb->reloid)
+ return (ca->reloid < cb->reloid ? -1 : 1);
+ else
+ return 0;
+}
+
+/*
+ * Find and remove useless self joins.
+ *
+ * Search for joins where a relation is joined to itself. If the join clause
+ * for each tuple from one side of the join is proven to match the same
+ * physical row (or nothing) on the other side, that self-join can be
+ * eliminated from the query. Suitable join clauses are assumed to be in the
+ * form of X = X, and can be replaced with NOT NULL clauses.
+ *
+ * For the sake of simplicity, we don't apply this optimization to special
+ * joins. Here is a list of what we could do in some particular cases:
+ * 'a a1 semi join a a2': is reduced to inner by reduce_unique_semijoins,
+ * and then removed normally.
+ * 'a a1 anti join a a2': could simplify to a scan with 'outer quals AND
+ * (IS NULL on join columns OR NOT inner quals)'.
+ * 'a a1 left join a a2': could simplify to a scan like inner but without
+ * NOT NULL conditions on join columns.
+ * 'a a1 left join (a a2 join b)': can't simplify this, because join to b
+ * can both remove rows and introduce duplicates.
+ *
+ * To search for removable joins, we order all the relations on their Oid,
+ * go over each set with the same Oid, and consider each pair of relations
+ * in this set.
+ *
+ * To remove the join, we mark one of the participating relations as dead
+ * and rewrite all references to it to point to the remaining relation.
+ * This includes modifying RestrictInfos, EquivalenceClasses, and
+ * EquivalenceMembers. We also have to modify the row marks. The join clauses
+ * of the removed relation become either restriction or join clauses, based on
+ * whether they reference any relations not participating in the removed join.
+ *
+ * 'joinlist' is the top-level joinlist of the query. If it has any
+ * references to the removed relations, we update them to point to the
+ * remaining ones.
+ */
+List *
+remove_useless_self_joins(PlannerInfo *root, List *joinlist)
+{
+ Relids toRemove = NULL;
+ int relid = -1;
+
+ if (!enable_self_join_elimination || joinlist == NIL ||
+ (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List)))
+ return joinlist;
+
+ /*
+ * Merge pairs of relations participated in self-join. Remove unnecessary
+ * range table entries.
+ */
+ toRemove = remove_self_joins_recurse(root, joinlist, toRemove);
+
+ if (unlikely(toRemove != NULL))
+ {
+ /* At the end, remove orphaned relation links */
+ while ((relid = bms_next_member(toRemove, relid)) >= 0)
+ {
+ int nremoved = 0;
+
+ joinlist = remove_rel_from_joinlist(joinlist, relid, &nremoved);
+ if (nremoved != 1)
+ elog(ERROR, "failed to find relation %d in joinlist", relid);
+ }
+ }
+
+ return joinlist;
}
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d56..5467e094ca7 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
+ /*
+ * Remove self joins on a unique column.
+ */
+ joinlist = remove_useless_self_joins(root, joinlist);
+
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 7c27dc24e21..eab44da65b8 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -639,14 +639,17 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
- * using the subquery's original targetlist expressions, not the
+ * using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
- * be an RTE corresponding to each setop's output.
+ * be an RTE corresponding to each setop's output. Note, we use this not
+ * subquery's targetlist but subroot->parse's targetlist, because it was
+ * revised by self-join removal. subquery's targetlist might contain the
+ * references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);
diff --git a/src/backend/rewrite/rewriteManip.c b/src/backend/rewrite/rewriteManip.c
index a115b217c91..9433548d279 100644
--- a/src/backend/rewrite/rewriteManip.c
+++ b/src/backend/rewrite/rewriteManip.c
@@ -64,7 +64,6 @@ static bool locate_windowfunc_walker(Node *node,
locate_windowfunc_context *context);
static bool checkExprHasSubLink_walker(Node *node, void *context);
static Relids offset_relid_set(Relids relids, int offset);
-static Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
static Node *add_nulling_relids_mutator(Node *node,
add_nulling_relids_context *context);
static Node *remove_nulling_relids_mutator(Node *node,
@@ -543,6 +542,8 @@ offset_relid_set(Relids relids, int offset)
* (identified by sublevels_up and rt_index), and change their varno fields
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
+ * Specifying 'change_RangeTblRef' to false allows skipping RangeTblRef.
+ * See ChangeVarNodesExtended for details.
*
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
@@ -554,6 +555,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
+ bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@@ -586,7 +588,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
- if (IsA(node, RangeTblRef))
+ if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@@ -633,6 +635,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
+ if (IsA(node, RestrictInfo))
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) node;
+ int relid = -1;
+ bool is_req_equal =
+ (rinfo->required_relids == rinfo->clause_relids);
+ bool clause_relids_is_multiple =
+ (bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
+
+ if (bms_is_member(context->rt_index, rinfo->clause_relids))
+ {
+ expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
+ expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
+
+ rinfo->clause_relids =
+ adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
+ rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
+ rinfo->left_relids =
+ adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
+ rinfo->right_relids =
+ adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
+ }
+
+ if (is_req_equal)
+ rinfo->required_relids = rinfo->clause_relids;
+ else
+ rinfo->required_relids =
+ adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
+
+ rinfo->outer_relids =
+ adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
+ rinfo->incompatible_relids =
+ adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
+
+ if (rinfo->mergeopfamilies &&
+ bms_get_singleton_member(rinfo->clause_relids, &relid) &&
+ clause_relids_is_multiple &&
+ relid == context->new_index && IsA(rinfo->clause, OpExpr))
+ {
+ Expr *leftOp;
+ Expr *rightOp;
+
+ leftOp = (Expr *) get_leftop(rinfo->clause);
+ rightOp = (Expr *) get_rightop(rinfo->clause);
+
+ /*
+ * For self-join elimination, changing varnos could transform
+ * "t1.a = t2.a" into "t1.a = t1.a". That is always true as long
+ * as "t1.a" is not null. We use qual() to check for such a case,
+ * and then we replace the qual for a check for not null
+ * (NullTest).
+ */
+ if (leftOp != NULL && equal(leftOp, rightOp))
+ {
+ NullTest *ntest = makeNode(NullTest);
+
+ ntest->arg = leftOp;
+ ntest->nulltesttype = IS_NOT_NULL;
+ ntest->argisrow = false;
+ ntest->location = -1;
+ rinfo->clause = (Expr *) ntest;
+ rinfo->mergeopfamilies = NIL;
+ rinfo->left_em = NULL;
+ rinfo->right_em = NULL;
+ }
+ Assert(rinfo->orclause == NULL);
+ }
+ return false;
+ }
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@@ -665,32 +736,32 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
+/*
+ * ChangeVarNodesExtended - similar to ChangeVarNodes, but has additional
+ * 'change_RangeTblRef' param
+ *
+ * ChangeVarNodes changes a given node and all of its underlying nodes.
+ * However, self-join elimination (SJE) needs to skip the RangeTblRef node
+ * type. During SJE's last step, remove_rel_from_joinlist() removes
+ * remaining RangeTblRefs with target relid. If ChangeVarNodes() replaces
+ * the target relid before, remove_rel_from_joinlist() fails to identify
+ * the nodes to delete.
+ */
void
-ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
+ context.change_RangeTblRef = change_RangeTblRef;
- /*
- * Must be prepared to start with a Query or a bare expression tree; if
- * it's a Query, go straight to query_tree_walker to make sure that
- * sublevels_up doesn't get incremented prematurely.
- */
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
- /*
- * If we are starting at a Query, and sublevels_up is zero, then we
- * must also fix rangetable indexes in the Query itself --- namely
- * resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
- * entries. sublevels_up cannot be zero when recursing into a
- * subquery, so there's no need to have the same logic inside
- * ChangeVarNodes_walker.
- */
if (sublevels_up == 0)
{
ListCell *l;
@@ -701,7 +772,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
- /* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@@ -719,15 +789,22 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
+void
+ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
+{
+ ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
+}
+
/*
- * Substitute newrelid for oldrelid in a Relid set
+ * adjust_relid_set - substitute newrelid for oldrelid in a Relid set
*
- * Note: some extensions may pass a special varno such as INDEX_VAR for
- * oldrelid. bms_is_member won't like that, but we should tolerate it.
- * (Perhaps newrelid could also be a special varno, but there had better
- * not be a reason to inject that into a nullingrels or phrels set.)
+ * Attempt to remove oldrelid from a Relid set (as long as it's not a special
+ * varno). If oldrelid was found and removed, insert newrelid into a Relid
+ * set (as long as it's not a special varno). Therefore, when oldrelid is
+ * a special varno, this function does nothing. When newrelid is a special
+ * varno, this function behaves as delete.
*/
-static Relids
+Relids
adjust_relid_set(Relids relids, int oldrelid, int newrelid)
{
if (!IS_SPECIAL_VARNO(oldrelid) && bms_is_member(oldrelid, relids))
@@ -736,7 +813,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
- relids = bms_add_member(relids, newrelid);
+ if (!IS_SPECIAL_VARNO(newrelid))
+ relids = bms_add_member(relids, newrelid);
}
return relids;
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 42728189322..cce73314609 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable removal of unique self-joins."),
+ NULL,
+ GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
+ },
+ &enable_self_join_elimination,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 00c700cc3e7..fbf05322c75 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -201,6 +201,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
+ *
+ * NOTE: When adding new entries containing relids and relid bitmapsets,
+ * remember to check that they will be correctly processed by
+ * the remove_self_join_rel function - relid of removing relation will be
+ * correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@@ -753,7 +758,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
- * unique_for_rels - list of Relid sets, each one being a set of other
+ * unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@@ -992,7 +997,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
- /* known unique for these other relid set(s) */
+ /* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@@ -3463,4 +3468,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
+/*
+ * UniqueRelInfo caches a fact that a relation is unique when being joined
+ * to other relation(s).
+ */
+typedef struct UniqueRelInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * The relation in consideration is unique when being joined with this set
+ * of other relation(s).
+ */
+ Relids outerrelids;
+
+ /*
+ * The relation in consideration is unique when considering only clauses
+ * suitable for self-join (passed split_selfjoin_quals()).
+ */
+ bool self_join;
+
+ /*
+ * Additional clauses from a baserestrictinfo list that were used to prove
+ * the uniqueness. We cache it for the self-join checking procedure: a
+ * self-join can be removed if the outer relation contains strictly the
+ * same set of clauses.
+ */
+ List *extra_clauses;
+} UniqueRelInfo;
+
#endif /* PATHNODES_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index bcf8ed645c2..78e05d88c8e 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
+#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
+ * output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f0..bc5dfd7db41 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
+extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist, List *exprlist,
+ List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe3..5a930199611 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
+extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
+extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
+ Relids outerrelids, RelOptInfo *innerrel,
+ JoinType jointype, List *restrictlist,
+ bool force_cache, List **uclauses);
+extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c
diff --git a/src/include/rewrite/rewriteManip.h b/src/include/rewrite/rewriteManip.h
index 512823033b9..5ec475c63e9 100644
--- a/src/include/rewrite/rewriteManip.h
+++ b/src/include/rewrite/rewriteManip.h
@@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
+extern Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
+extern void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
+ int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,
diff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 56227505009..ad8ab294ff6 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: ((n.ff + n.ff) = p.f1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
+ -> Seq Scan on ec0 n
+ -> Materialize
+ -> Seq Scan on ec1 p
+(5 rows)
+
+reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 3ffc066b1f8..a57bb18c24f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5966,6 +5966,27 @@ select c.id, ss.a from c
-> Seq Scan on c
(7 rows)
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on b
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on b b_1
+ -> Materialize
+ -> Seq Scan on a t1
+(7 rows)
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
-- test join removals on a partitioned table
@@ -6377,6 +6398,1068 @@ select * from
----+----+----+----
(0 rows)
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = (a - 1)))
+(2 rows)
+
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b < 10))
+(2 rows)
+
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+ a | b | c
+---+---+---
+ 2 | 1 | 1
+(1 row)
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.c)
+ -> Seq Scan on sj t2
+ -> Materialize
+ -> Seq Scan on sj t1
+ Filter: (b IS NOT NULL)
+(6 rows)
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ -> Seq Scan on sj t1
+ -> Memoize
+ Cache Key: t1.a, t1.b
+ Cache Mode: binary
+ -> Sample Scan on sj
+ Sampling: system (t1.b)
+ Filter: (t1.a = a)
+(8 rows)
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+ QUERY PLAN
+---------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+(2 rows)
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+(6 rows)
+
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t3.b)
+ -> Nested Loop
+ Join Filter: (t1.a = t2.b)
+ -> Seq Scan on sj t1
+ Filter: (a = b)
+ -> Seq Scan on sj t2
+ Filter: (b = a)
+ -> Seq Scan on sj t3
+ Filter: (b = a)
+(10 rows)
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Seq Scan on sj t3
+ Filter: ((a IS NOT NULL) AND (b IS NOT NULL) AND ((b + 1) IS NOT NULL))
+(2 rows)
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on sj t2
+ Filter: (a IS NOT NULL)
+ SubPlan 1
+ -> Result
+ One-Time Filter: (t2.a = t2.a)
+ -> Seq Scan on sj
+ Filter: (a = t2.a)
+(7 rows)
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: (y.a = z.q1)
+ -> Seq Scan on sj y
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl z
+(6 rows)
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+ Join Filter: (t2.a = t4.a)
+ -> Seq Scan on sj t2
+ Filter: ((b IS NULL) AND (a IS NOT NULL) AND ((c * c) = (c + 2)))
+ -> Seq Scan on sj t4
+ Filter: (c IS NOT NULL)
+(6 rows)
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Left Join
+ -> Result
+ -> Nested Loop Left Join
+ -> Seq Scan on sj j2
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on int8_tbl y
+(7 rows)
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: (t2.a)
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Result
+ Output: t2.a
+(7 rows)
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+ QUERY PLAN
+------------------------------------
+ Nested Loop
+ Output: t3.a
+ -> Seq Scan on public.sj t2
+ Output: t2.a, t2.b, t2.c
+ Filter: (t2.a IS NOT NULL)
+ -> Seq Scan on public.sj t3
+ Output: t3.a
+(7 rows)
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+ QUERY PLAN
+------------------------------------------
+ Nested Loop Semi Join
+ Output: t1.a
+ Join Filter: (t1.b = t3.b)
+ -> Seq Scan on public.sj t1
+ Output: t1.a, t1.b, t1.c
+ -> Materialize
+ Output: t3.c, t3.b
+ -> Seq Scan on public.sj t3
+ Output: t3.c, t3.b
+ Filter: (t3.c IS NOT NULL)
+(10 rows)
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+INSERT INTO sj VALUES (3, 1, 3);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+ Filter: (a = 3)
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 2 | 1 | 1 | 2 | 1 | 1
+(1 row)
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ QUERY PLAN
+-----------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 1))
+(2 rows)
+
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ QUERY PLAN
+---------------------------------
+ Nested Loop
+ Join Filter: (t1.b = t2.b)
+ -> Seq Scan on sj t2
+ Filter: (a = 42)
+ -> Seq Scan on sj t1
+ Filter: (a IS NOT NULL)
+(6 rows)
+
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+ a | b | c
+---+---+---
+(0 rows)
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+ QUERY PLAN
+-----------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = 1))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((a * a) = 1)
+ -> Seq Scan on sj j2
+ Filter: ((a * a) = 2)
+(6 rows)
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND ((a * a) = (((EXTRACT(dow FROM CURRENT_TIMESTAMP(0)) / '15'::numeric) + '3'::numeric))::integer))
+(2 rows)
+
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+(0 rows)
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (((a * c) / 3) = (((random() / '3'::double precision) + '3'::double precision))::integer)
+ -> Seq Scan on sj j2
+ Filter: ((((random() / '3'::double precision) + '3'::double precision))::integer = ((a * c) / 3))
+(6 rows)
+
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+ a | b | c | a | b | c
+---+---+---+---+---+---
+ 3 | 1 | 3 | 3 | 1 | 3
+(1 row)
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj j2
+ Filter: ((b IS NOT NULL) AND (a = 2) AND (c = 3))
+(2 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+ QUERY PLAN
+---------------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: ((2 = a) AND (c = 3))
+ -> Seq Scan on sj j2
+ Filter: ((c = 3) AND (a = 1))
+(6 rows)
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j1
+ Filter: (a = 2)
+ -> Seq Scan on sj j2
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (j1.b = j2.b)
+ -> Seq Scan on sj j2
+ Filter: (2 = a)
+ -> Seq Scan on sj j1
+(5 rows)
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+ QUERY PLAN
+---------------------------------------------------------------
+ Nested Loop
+ Join Filter: ((j1.b = j2.b) AND ((j1.a = 1) OR (j2.a = 1)))
+ -> Seq Scan on sj j1
+ -> Materialize
+ -> Seq Scan on sj j2
+(5 rows)
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+ QUERY PLAN
+-----------------------------------------------------------
+ Aggregate
+ -> Bitmap Heap Scan on tab_with_flag
+ Recheck Cond: (id = ANY ('{2,3}'::integer[]))
+ Filter: ((is_flag IS NULL) OR (is_flag = 0))
+ -> Bitmap Index Scan on tab_with_flag_pkey
+ Index Cond: (id = ANY ('{2,3}'::integer[]))
+(6 rows)
+
+DROP TABLE tab_with_flag;
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+ QUERY PLAN
+---------------------------------
+ HashAggregate
+ Group Key: q.b
+ Filter: (sum(q.a) = 1)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+(5 rows)
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.sj y
+ Output: y.a, y.b, y.c
+ Filter: (y.a IS NOT NULL)
+ -> Function Scan on pg_catalog.generate_series gs
+ Output: gs.i
+ Function Call: generate_series(1, y.a)
+(8 rows)
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+ QUERY PLAN
+------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((q.a + q.a) = r.a)
+ -> Seq Scan on sj q
+ Filter: (a IS NOT NULL)
+ -> Materialize
+ -> Seq Scan on sj r
+(6 rows)
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+ QUERY PLAN
+-----------------------------------------------------
+ Seq Scan on sj q
+ Filter: ((a IS NOT NULL) AND (b = 2) AND (b = 1))
+(2 rows)
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+ QUERY PLAN
+-----------------------------------------------------
+ Nested Loop
+ Join Filter: (k1.b = j2.b)
+ -> Nested Loop
+ -> Index Scan using sk_a_idx on sk k1
+ -> Index Only Scan using sk_a_idx on sk k2
+ Index Cond: (a = k1.a)
+ -> Materialize
+ -> Index Scan using sj_a_key on sj j2
+ Index Cond: (a IS NOT NULL)
+(9 rows)
+
+reset join_collapse_limit;
+reset enable_seqscan;
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+ QUERY PLAN
+------------------------------------------
+ Seq Scan on public.emp1 e2
+ Output: e2.id, e2.code, e2.id, e2.code
+ Filter: (e2.code <> e2.code)
+(3 rows)
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+ QUERY PLAN
+-----------------------------------------
+ Aggregate
+ -> Seq Scan on emp1 c3
+ Filter: ((id * id) IS NOT NULL)
+(3 rows)
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+ QUERY PLAN
+---------------------------
+ HashSetOp Except All
+ -> Seq Scan on emp1 c2
+ -> Seq Scan on emp1 c3
+(3 rows)
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on emp1 t1
+ -> Materialize
+ -> Nested Loop Left Join
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t4
+(7 rows)
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+ QUERY PLAN
+----------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.emp1 t1
+ Output: t1.id, t1.code
+ -> Materialize
+ Output: t3.id
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id
+ Filter: (1 = 1)
+(9 rows)
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+ QUERY PLAN
+----------------------------------------------------------
+ Nested Loop Left Join
+ Output: t2.id, t2.code, t2.id, t2.code, (t2.id), t3.t3
+ -> Seq Scan on public.emp1 t2
+ Output: t2.id, t2.code
+ -> Function Scan on pg_catalog.generate_series t3
+ Output: t3.t3, t2.id
+ Function Call: generate_series(1, 1)
+(7 rows)
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+ QUERY PLAN
+------------------------------------------------------
+ Nested Loop Left Join
+ Output: t1.id, (t1.id), t3.id
+ -> Function Scan on pg_catalog.generate_series t1
+ Output: t1.id
+ Function Call: generate_series(1, 10)
+ -> Seq Scan on public.emp1 t3
+ Output: t3.id, t1.id
+(7 rows)
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+ QUERY PLAN
+----------------------------------------------
+ Nested Loop Left Join
+ Join Filter: ((t2.id > 1) AND (t2.id < 2))
+ -> Seq Scan on emp1 t2
+ -> Materialize
+ -> Seq Scan on emp1 t3
+(5 rows)
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ QUERY PLAN
+-------------------------------------------------------
+ Update on emp1
+ -> Nested Loop
+ -> Seq Scan on emp1
+ -> Index Scan using emp1_pkey on emp1 emp1_1
+ Index Cond: (id = emp1.id)
+(5 rows)
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+ id | code | code
+----+------+------
+ 1 | 2 | 1
+ 2 | 2 | 1
+(2 rows)
+
+TRUNCATE emp1;
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+ QUERY PLAN
+-------------------------------------
+ Update on sj sq
+ -> Nested Loop
+ Join Filter: (sq.a = sz.a)
+ -> Seq Scan on sj sq
+ -> Materialize
+ -> Seq Scan on sj sz
+(6 rows)
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+ QUERY PLAN
+--------------------------------------
+ Update on sj sj_1
+ -> Nested Loop
+ Join Filter: (sj.a = sj_1.a)
+ -> Seq Scan on sj sj_1
+ -> Materialize
+ -> Seq Scan on sj
+(6 rows)
+
+DROP RULE sj_del_rule ON sj CASCADE;
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ QUERY PLAN
+--------------------------
+ Result
+ One-Time Filter: false
+(2 rows)
+
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+ ?column?
+----------
+(0 rows)
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ QUERY PLAN
+---------------------------------------------
+ Nested Loop
+ -> Seq Scan on emp1 t1
+ -> Index Scan using emp1_pkey on emp1 t2
+ Index Cond: (id = t1.code)
+ Filter: (code > 0)
+(5 rows)
+
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+ id | code
+----+------
+ 1 | 1
+ 2 | 1
+(2 rows)
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+ QUERY PLAN
+------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: (b = 1)
+ -> Seq Scan on sl t2
+ Filter: (b = 2)
+(6 rows)
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (t1.a = t2.a)
+ -> Seq Scan on sl t1
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 1))
+ -> Seq Scan on sl t2
+ Filter: ((c IS NOT NULL) AND (b IS NOT NULL) AND (a IS NOT NULL) AND (b = 2))
+(6 rows)
+
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+ QUERY PLAN
+---------------------------------------------------------------------------------------------
+ Seq Scan on public.sl t2
+ Output: t2.a, t2.b, t2.c, t2.a, t2.b, t2.c
+ Filter: ((t2.c IS NOT NULL) AND (t2.b IS NOT NULL) AND (t2.a IS NOT NULL) AND (t2.a = 3))
+(3 rows)
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(5 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+ QUERY PLAN
+-------------------------------
+ Nested Loop
+ Join Filter: (n1.a <> n2.a)
+ -> Nested Loop
+ -> Seq Scan on sl
+ -> Seq Scan on sj n2
+ Filter: (a = 1)
+ -> Seq Scan on sj n1
+(7 rows)
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Nested Loop
+ Join Filter: (sj_t3.id = sj_t1.id)
+ -> Nested Loop
+ Join Filter: (sj_t2.id = sj_t3.id)
+ -> Nested Loop Semi Join
+ -> Nested Loop
+ -> HashAggregate
+ Group Key: sj_t3.id
+ -> Nested Loop
+ -> Seq Scan on sj_t4
+ -> Materialize
+ -> Bitmap Heap Scan on sj_t3
+ Recheck Cond: (a = 1)
+ -> Bitmap Index Scan on sj_t3_a_id_idx
+ Index Cond: (a = 1)
+ -> Index Only Scan using sj_t2_id_idx on sj_t2 sj_t2_1
+ Index Cond: (id = sj_t3.id)
+ -> Nested Loop
+ -> Index Only Scan using sj_t3_a_id_idx on sj_t3 sj_t3_1
+ Index Cond: ((a = 1) AND (id = sj_t3.id))
+ -> Seq Scan on sj_t4 sj_t4_1
+ -> Index Only Scan using sj_t2_id_idx on sj_t2
+ Index Cond: (id = sj_t2_1.id)
+ -> Seq Scan on sj_t1
+(24 rows)
+
+--
+-- Test RowMarks-related code
+--
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+ QUERY PLAN
+---------------------------------
+ LockRows
+ -> Seq Scan on sj a2
+ Filter: (a IS NOT NULL)
+(3 rows)
+
+reset enable_hashjoin;
+reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 352abc0bd42..83228cfca29 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
+ enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 28ed7910d01..7fc2159349b 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
+-- Test that broken ECs are processed correctly during self join removal.
+-- Disable merge joins so that we don't get an error about missing commutator.
+-- Test both orientations of the join clause, because only one of them breaks
+-- the EC.
+set enable_mergejoin to off;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on m.ff + n.ff = p.f1;
+
+explain (costs off)
+ select * from ec0 m join ec0 n on m.ff = n.ff
+ join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
+
+reset enable_mergejoin;
+
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c7349eab933..c29d13b9fed 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2175,6 +2175,17 @@ select c.id, ss.a from c
left join (select d.a from onerow, d left join b on d.a = b.id) ss
on c.id = ss.a;
+-- check the case when the placeholder relates to an outer join and its
+-- inner in the press field but actually uses only the outer side of the join
+explain (costs off)
+SELECT q.val FROM b LEFT JOIN (
+ SELECT (q1.z IS NOT NULL) AS val
+ FROM b LEFT JOIN (
+ SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
+ ) AS q1
+ ON true
+) AS q ON true;
+
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
@@ -2409,6 +2420,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
+--
+-- test that semi- or inner self-joins on a unique column are removed
+--
+
+-- enable only nestloop to get more predictable plans
+set enable_hashjoin to off;
+set enable_mergejoin to off;
+
+create table sj (a int unique, b int, c int unique);
+insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
+analyze sj;
+
+-- Trivial self-join case.
+explain (costs off)
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
+
+-- Self-join removal performs after a subquery pull-up process and could remove
+-- such kind of self-join too. Check this option.
+explain (costs off)
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+select * from sj p
+where exists (select * from sj q
+ where q.a = p.a and q.b < 10);
+
+-- Don't remove self-join for the case of equality of two different unique columns.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
+
+-- Ensure that relations with TABLESAMPLE clauses are not considered as
+-- candidates to be removed
+explain (costs off)
+select * from sj t1
+ join lateral
+ (select * from sj tablesample system(t1.b)) s
+ on t1.a = s.a;
+
+-- Ensure that SJE does not form a self-referential lateral dependency
+explain (costs off)
+select * from sj t1
+ left join lateral
+ (select t1.a as t1a, * from sj t2) s
+ on true
+where t1.a = s.a;
+
+-- Degenerated case.
+explain (costs off)
+select * from
+ (select a as x from sj where false) as q1,
+ (select a as y from sj where false) as q2
+where q1.x = q2.y;
+
+-- We can't use a cross-EC generated self join qual because of current logic of
+-- the generate_join_implied_equalities routine.
+explain (costs off)
+select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
+explain (costs off)
+select * from sj t1, sj t2, sj t3
+where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
+ t1.b = t3.b and t3.b = t3.a;
+
+-- Double self-join removal.
+-- Use a condition on "b + 1", not on "b", for the second join, so that
+-- the equivalence class is different from the first one, and we can
+-- test the non-ec code path.
+explain (costs off)
+select *
+from sj t1
+ join sj t2 on t1.a = t2.a and t1.b = t2.b
+ join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
+
+-- subselect that references the removed relation
+explain (costs off)
+select t1.a, (select a from sj where a = t2.a and a = t1.a)
+from sj t1, sj t2
+where t1.a = t2.a;
+
+-- self-join under outer join
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on x.a = z.q1;
+
+explain (costs off)
+select * from sj x join sj y on x.a = y.a
+left join int8_tbl z on y.a = z.q1;
+
+explain (costs off)
+select * from (
+ select t1.*, t2.a as ax from sj t1 join sj t2
+ on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
+) as q1
+left join
+ (select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
+on q1.ax = q2.a;
+
+-- Test that placeholders are updated correctly after join removal
+explain (costs off)
+select * from (values (1)) x
+left join (select coalesce(y.q1, 1) from int8_tbl y
+ right join sj j1 inner join sj j2 on j1.a = j2.a
+ on true) z
+on true;
+
+-- Test that references to the removed rel in lateral subqueries are replaced
+-- correctly after join removal
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t1.a offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t3.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
+
+explain (verbose, costs off)
+select t4.a from sj t1
+ join sj t2 on t1.a = t2.a
+ join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
+
+-- Check updating of semi_rhs_exprs links from upper-level semi join to
+-- the removing relation
+explain (verbose, costs off)
+select t1.a from sj t1 where t1.b in (
+ select t2.b from sj t2 join sj t3 on t2.c=t3.c);
+
+--
+-- SJE corner case: uniqueness of an inner is [partially] derived from
+-- baserestrictinfo clauses.
+-- XXX: We really should allow SJE for these corner cases?
+--
+
+INSERT INTO sj VALUES (3, 1, 3);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
+
+-- Remove SJ, define uniqueness by a constant
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+-- Return one row
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
+
+-- Remove SJ, define uniqueness by a constant expression
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
+
+-- Shuffle a clause. Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+-- Return no rows
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
+
+-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
+-- after SJ elimination it shouldn't be a mergejoinable clause.
+EXPLAIN (COSTS OFF)
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+SELECT t4.*
+FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
+JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
+
+-- Functional index
+CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
+
+-- Restriction contains expressions in both sides, Remove SJ.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+-- Empty set of rows should be returned
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
+ AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
+
+-- Restriction contains volatile function - disable SJE feature.
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+-- Return one row
+SELECT * FROM sj j1, sj j2
+WHERE j1.b = j2.b
+ AND (j1.a*j1.c/3) = (random()/3 + 3)::int
+ AND (random()/3 + 3)::int = (j2.a*j2.c/3);
+
+-- Multiple filters
+CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
+
+-- Remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+ SELECT * FROM sj j1, sj j2
+ WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
+
+CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
+
+-- Don't remove SJ
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
+
+DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
+
+-- Test that OR predicated are updated correctly after join removal
+CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
+CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
+
+EXPLAIN (COSTS OFF)
+SELECT COUNT(*) FROM tab_with_flag
+WHERE
+ (is_flag IS NULL OR is_flag = 0)
+ AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
+DROP TABLE tab_with_flag;
+
+-- HAVING clause
+explain (costs off)
+select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
+
+-- update lateral references and range table entry reference
+explain (verbose, costs off)
+select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+explain (verbose, costs off)
+select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
+ lateral generate_series(1, q.a) gs(i);
+
+-- Test that a non-EC-derived join clause is processed correctly. Use an
+-- outer join so that we can't form an EC.
+explain (costs off) select * from sj p join sj q on p.a = q.a
+ left join sj r on p.a + q.a = r.a;
+
+-- FIXME this constant false filter doesn't look good. Should we merge
+-- equivalence classes?
+explain (costs off)
+select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
+
+-- Check that attr_needed is updated correctly after self-join removal. In this
+-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
+-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
+-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
+-- seqscan. Also disable reordering of joins because this test depends on a
+-- particular join tree.
+create table sk (a int, b int);
+create index on sk(a);
+set join_collapse_limit to 1;
+set enable_seqscan to off;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
+explain (costs off) select 1 from
+ (sk k1 join sk k2 on k1.a = k2.a)
+ join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
+reset join_collapse_limit;
+reset enable_seqscan;
+
+-- Check that clauses from the join filter list is not lost on the self-join removal
+CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
+
+-- Shuffle self-joined relations. Only in the case of iterative deletion
+-- attempts explains of these queries will be identical.
+CREATE UNIQUE INDEX ON emp1((id*id));
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
+WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
+
+-- Check the usage of a parse tree by the set operations (bug #18170)
+EXPLAIN (COSTS OFF)
+SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
+WHERE c2.id IS NOT NULL
+EXCEPT ALL
+SELECT c3.code FROM emp1 c3;
+
+-- Check that SJE removes references from PHVs correctly
+explain (costs off)
+select * from emp1 t1 left join
+ (select coalesce(t3.code, 1) from emp1 t2
+ left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
+ on true)
+on true;
+
+-- Check that SJE removes the whole PHVs correctly
+explain (verbose, costs off)
+select 1 from emp1 t1 left join
+ ((select 1 as x, * from emp1 t2) s1 inner join
+ (select * from emp1 t3) s2 on s1.id = s2.id)
+ on true
+where s1.x = 1;
+
+-- Check that PHVs do not impose any constraints on removing self joins
+explain (verbose, costs off)
+select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
+ lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
+
+explain (verbose, costs off)
+select * from generate_series(1,10) t1(id) left join
+ lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
+on true;
+
+-- Check that SJE replaces join clauses involving the removed rel correctly
+explain (costs off)
+select * from emp1 t1
+ inner join emp1 t2 on t1.id = t2.id
+ left join emp1 t3 on t1.id > 1 and t1.id < 2;
+
+-- Check that SJE doesn't replace the target relation
+EXPLAIN (COSTS OFF)
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+INSERT INTO emp1 VALUES (1, 1), (2, 1);
+
+WITH t1 AS (SELECT * FROM emp1)
+UPDATE emp1 SET code = t1.code + 1 FROM t1
+WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
+
+TRUNCATE emp1;
+
+EXPLAIN (COSTS OFF)
+UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
+
+CREATE RULE sj_del_rule AS ON DELETE TO sj
+ DO INSTEAD
+ UPDATE sj SET a = 1 WHERE a = old.a;
+EXPLAIN (COSTS OFF) DELETE FROM sj;
+DROP RULE sj_del_rule ON sj CASCADE;
+
+-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
+insert into emp1 values (1, 1);
+explain (costs off)
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+select 1 from emp1 full join
+ (select * from emp1 t1 join
+ emp1 t2 join emp1 t3 on t2.id = t3.id
+ on true
+ where false) s on true
+where false;
+
+-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
+-- made with different set of quals
+insert into emp1 values (2, 1);
+explain (costs off)
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+select * from emp1 t1 where exists (select * from emp1 t2
+ where t2.id = t1.code and t2.code > 0);
+
+-- We can remove the join even if we find the join can't duplicate rows and
+-- the base quals of each side are different. In the following case we end up
+-- moving quals over to s1 to make it so it can't match any rows.
+create table sl(a int, b int, c int);
+create unique index on sl(a, b);
+vacuum analyze sl;
+
+-- Both sides are unique, but base quals are different
+explain (costs off)
+select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
+
+-- Check NullTest in baserestrictinfo list
+explain (costs off)
+select * from sl t1, sl t2
+where t1.a = t2.a and t1.b = 1 and t2.b = 2
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+explain (verbose, costs off)
+select * from sl t1, sl t2
+where t1.b = t2.b and t2.a = 3 and t1.a = 3
+ and t1.c IS NOT NULL and t2.c IS NOT NULL
+ and t2.b IS NOT NULL and t1.b IS NOT NULL
+ and t1.a IS NOT NULL and t2.a IS NOT NULL;
+
+-- Join qual isn't mergejoinable, but inner is unique.
+EXPLAIN (COSTS OFF)
+SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM
+(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
+WHERE q0.a = 1;
+
+-- Check optimization disabling if it will violate special join conditions.
+-- Two identical joined relations satisfies self join removal conditions but
+-- stay in different special join infos.
+CREATE TABLE sj_t1 (id serial, a int);
+CREATE TABLE sj_t2 (id serial, a int);
+CREATE TABLE sj_t3 (id serial, a int);
+CREATE TABLE sj_t4 (id serial, a int);
+
+CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
+CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM sj_t1
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) t2t3t4
+ON sj_t1.id = t2t3t4.id
+JOIN (
+ SELECT sj_t2.id AS id FROM sj_t2
+ WHERE EXISTS
+ (
+ SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
+ )
+ ) _t2t3t4
+ON sj_t1.id = _t2t3t4.id;
+
+--
+-- Test RowMarks-related code
+--
+
+-- Both sides have explicit LockRows marks
+EXPLAIN (COSTS OFF)
+SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
+
+reset enable_hashjoin;
+reset enable_mergejoin;
+
--
-- Test hints given on incorrect column references are useful
--
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b6c170ac249..bce4214503d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2593,6 +2593,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
+SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@@ -4056,6 +4057,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
+UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func
--
2.39.5 (Apple Git-154)
On 13.02.2025 02:00, Alexander Korotkov wrote:
Thank you. I've integrated that into a patch. However, I've to
change keep_rinfo_list to be passed by pointer to
add_non_redundant_clauses(), because it might be changed in
distribute_restrictinfo_to_rels(). Without that there is a case of
duplicated clause in regression tests.I've changed 'inner' and 'outer' vise versa in
remove_self_joins_one_group() for better readability (I believe that
was discussed upthread but lost). Also, I did a round of improvement
for comments and commit message.
On 15.02.2025 12:19, Alexander Korotkov wrote:
On Thu, Feb 13, 2025 at 1:00 AM Alexander Korotkov<aekorotkov@gmail.com> wrote:
On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:Hi! Thank you for the work with this subject, I think it is really
important.On 10.02.2025 22:58, Alexander Korotkov wrote:
Hi!
On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov<lepihov@gmail.com> wrote:
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid is given
and not found. In this case adjust_relid_set() returns the original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too. See
the attachment for a short variant of such modification.Also, I did some grammar correction to your new comment in tests.
Thanks!
I've further revised adjust_relid_set() header comment.
Looking back to the work done since previous attempt to commit this to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yet the
current way still feels to be best possible.
3) For sure, several bugs were fixed.I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?I didn't find any mistakes, I just have a refactoring remark. I think
the part where we add non-redundant expressions with the
binfo_candidates, jinfo_candidates
check can be moved to a separate function, otherwise the code is very
repetitive in this place. I did it and attached diff fileThank you. I've integrated that into a patch. However, I've to
change keep_rinfo_list to be passed by pointer to
add_non_redundant_clauses(), because it might be changed in
distribute_restrictinfo_to_rels(). Without that there is a case of
duplicated clause in regression tests.I've changed 'inner' and 'outer' vise versa in
remove_self_joins_one_group() for better readability (I believe that
was discussed upthread but lost). Also, I did a round of improvement
for comments and commit message.I've corrected some spelling error reported by Alexander Lakhin
privately to me. Also, I've revised comments around ChangeVarNodes()
and ChangeVarNodesExtended(). I'm going to continue nitpicking this
patch during next couple days then push it if no objections.
I agree with your corrections and for me patch looks good.
--
Regards,
Alena Rybakina
Postgres Professional
Alexander Korotkov <aekorotkov@gmail.com> writes:
I've corrected some spelling error reported by Alexander Lakhin
privately to me. Also, I've revised comments around ChangeVarNodes()
and ChangeVarNodesExtended(). I'm going to continue nitpicking this
patch during next couple days then push it if no objections.
Coverity has another nit-pick:
/srv/coverity/git/pgsql-git/postgresql/src/backend/optimizer/plan/analyzejoins.c: 327 in remove_rel_from_query()
321 static void
322 remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
323 int subst, SpecialJoinInfo *sjinfo,
324 Relids joinrelids)
325 {
326 int relid = rel->relid;
CID 1643155: Integer handling issues (INTEGER_OVERFLOW)
Expression "ojrelid", which is equal to 4294967295, where "(sjinfo != NULL) ? sjinfo->ojrelid : 4294967295U" is known to be equal to 4294967295, overflows the type that receives it, a signed integer 32 bits wide.
327 int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
328 Index rti;
329 ListCell *l;
330
331 /*
332 * Update all_baserels and related relid sets.
This is unhappy because sjinfo->ojrelid, which is declared Index
(that is, unsigned int) is being converted to int. Admittedly
there are plenty of other places that do likewise, but the additional
load of assuming that -1 isn't a possible value of sjinfo->ojrelid
seems to be enough to draw its ire.
I suggest finding another way to code this function that doesn't
depend on that type pun. I think it's fairly accidental that
adjust_relid_set doesn't misbehave on -1 anyway, so personally I'd
get rid of that local variable entirely in favor of something like
if (sjinfo != NULL)
{
root->outer_join_rels = adjust_relid_set(root->outer_join_rels,
sjinfo->ojrelid, subst);
root->all_query_rels = adjust_relid_set(root->all_query_rels,
sjinfo->ojrelid, subst);
}
regards, tom lane
On Sun, Feb 23, 2025 at 7:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
I've corrected some spelling error reported by Alexander Lakhin
privately to me. Also, I've revised comments around ChangeVarNodes()
and ChangeVarNodesExtended(). I'm going to continue nitpicking this
patch during next couple days then push it if no objections.Coverity has another nit-pick:
/srv/coverity/git/pgsql-git/postgresql/src/backend/optimizer/plan/analyzejoins.c: 327 in remove_rel_from_query()
321 static void
322 remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
323 int subst, SpecialJoinInfo *sjinfo,
324 Relids joinrelids)
325 {
326 int relid = rel->relid;CID 1643155: Integer handling issues (INTEGER_OVERFLOW)
Expression "ojrelid", which is equal to 4294967295, where "(sjinfo != NULL) ? sjinfo->ojrelid : 4294967295U" is known to be equal to 4294967295, overflows the type that receives it, a signed integer 32 bits wide.327 int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
328 Index rti;
329 ListCell *l;
330
331 /*
332 * Update all_baserels and related relid sets.This is unhappy because sjinfo->ojrelid, which is declared Index
(that is, unsigned int) is being converted to int. Admittedly
there are plenty of other places that do likewise, but the additional
load of assuming that -1 isn't a possible value of sjinfo->ojrelid
seems to be enough to draw its ire.
Thank you for reporting this!
I suggest finding another way to code this function that doesn't
depend on that type pun. I think it's fairly accidental that
adjust_relid_set doesn't misbehave on -1 anyway, so personally I'd
get rid of that local variable entirely in favor of something likeif (sjinfo != NULL)
{
root->outer_join_rels = adjust_relid_set(root->outer_join_rels,
sjinfo->ojrelid, subst);
root->all_query_rels = adjust_relid_set(root->all_query_rels,
sjinfo->ojrelid, subst);
}
There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v1-0001-Get-rid-of-ojrelid-local-variable-in-remove_rel_f.patchapplication/octet-stream; name=v1-0001-Get-rid-of-ojrelid-local-variable-in-remove_rel_f.patchDownload
From b7c23991adc44c56beb8510b6f387607fcceb475 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 23 Feb 2025 23:06:33 +0200
Subject: [PATCH v1] Get rid of ojrelid local variable in
remove_rel_from_query()
Reported-by:
Bug:
Discussion:
Author:
Reviewed-by:
Tested-by:
Backpatch-through:
---
src/backend/optimizer/plan/analyzejoins.c | 81 +++++++++++++++--------
1 file changed, 53 insertions(+), 28 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 3aa04d0d4e1..b1e173c63bc 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -59,7 +59,8 @@ static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid, int subst);
+ SpecialJoinInfo *sjinfo,
+ int relid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -324,7 +325,6 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
Relids joinrelids)
{
int relid = rel->relid;
- int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
@@ -332,9 +332,15 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
* Update all_baserels and related relid sets.
*/
root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
- root->outer_join_rels = adjust_relid_set(root->outer_join_rels, ojrelid, subst);
root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
- root->all_query_rels = adjust_relid_set(root->all_query_rels, ojrelid, subst);
+
+ if (sjinfo != NULL)
+ {
+ root->outer_join_rels = bms_del_member(root->outer_join_rels,
+ sjinfo->ojrelid);
+ root->all_query_rels = bms_del_member(root->all_query_rels,
+ sjinfo->ojrelid);
+ }
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -366,22 +372,30 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
if (sjinfo != NULL)
{
- Assert(subst <= 0 && ojrelid > 0);
-
- /* Remove ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ Assert(subst <= 0);
+
+ /* Remove sjinfo->ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand,
+ sjinfo->ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand,
+ sjinfo->ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand,
+ sjinfo->ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand,
+ sjinfo->ojrelid);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l,
+ sjinfo->ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r,
+ sjinfo->ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l,
+ sjinfo->ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r,
+ sjinfo->ojrelid);
}
else
{
- Assert(subst > 0 && ojrelid == -1);
+ Assert(subst > 0);
ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
@@ -408,7 +422,7 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
+ (sjinfo == NULL || !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -419,7 +433,9 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
PlaceHolderVar *phv = phinfo->ph_var;
phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
- phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, ojrelid, subst);
+ if (sjinfo != NULL)
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at,
+ sjinfo->ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
@@ -437,7 +453,9 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
/* ph_lateral might or might not be empty */
phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
- phv->phrels = adjust_relid_set(phv->phrels, ojrelid, subst);
+ if (sjinfo != NULL)
+ phv->phrels = adjust_relid_set(phv->phrels,
+ sjinfo->ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
@@ -454,8 +472,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
if (bms_is_member(relid, ec->ec_relids) ||
- (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
- remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, sjinfo, relid, subst);
}
/*
@@ -671,7 +689,8 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
}
/*
- * Remove any references to relid or ojrelid from the EquivalenceClass.
+ * Remove any references to relid or sjinfo->ojrelid (if sjinfo != NULL)
+ * from the EquivalenceClass.
*
* Like remove_rel_from_restrictinfo, we don't worry about cleaning out
* any nullingrel bits in contained Vars and PHVs. (This might have to be
@@ -680,13 +699,16 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
+remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
+ int relid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
- ec->ec_relids = adjust_relid_set(ec->ec_relids, ojrelid, subst);
+ if (sjinfo != NULL)
+ ec->ec_relids = adjust_relid_set(ec->ec_relids,
+ sjinfo->ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -698,11 +720,14 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
+ (sjinfo != NULL && bms_is_member(sjinfo->ojrelid,
+ cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
- cur_em->em_relids = adjust_relid_set(cur_em->em_relids, ojrelid, subst);
+ if (sjinfo != NULL)
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids,
+ sjinfo->ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -713,10 +738,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- if (ojrelid == -1)
+ if (sjinfo == NULL)
ChangeVarNodes((Node *) rinfo, relid, subst, 0);
else
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ remove_rel_from_restrictinfo(rinfo, relid, sjinfo->ojrelid);
}
/*
--
2.39.5 (Apple Git-154)
On 23/2/2025 22:15, Alexander Korotkov wrote:
There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.
I passed through the patch. It works correctly.
Multiple ifs in a single routine is not ideal. I have thought about it
already, and it seems the remove_rel_from_query needs refactoring: when
I first reused it for self-join removal, we didn't have the 'ojrelid'
machinery, and it was implemented smoothly.
Right now, this code contains multiple places where we need to remove
the outer join relid and separating the removal code for the baserel and
outer join may simplify the logic and make it safer.
But the way to do it is not apparent now. May be if we implement a new
technique of query tree reduction, the approach will become more evident.
--
regards, Andrei Lepikhov
Hi, Andrei!
Thank you for your feedback.
On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 23/2/2025 22:15, Alexander Korotkov wrote:
There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.I passed through the patch. It works correctly.
Multiple ifs in a single routine is not ideal.
I would say there is a brachcing anyway. Even if it is inside
adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
don't think it gets worse.
I have thought about it
already, and it seems the remove_rel_from_query needs refactoring: when
I first reused it for self-join removal, we didn't have the 'ojrelid'
machinery, and it was implemented smoothly.
Right now, this code contains multiple places where we need to remove
the outer join relid and separating the removal code for the baserel and
outer join may simplify the logic and make it safer.
But the way to do it is not apparent now. May be if we implement a new
technique of query tree reduction, the approach will become more evident.
Could you, please, elaborate more on what you mean by "new technique
of query tree reduction"?
------
Regards,
Alexander Korotkov
Supabase
On 24/2/2025 11:57, Alexander Korotkov wrote:
Hi, Andrei!
Thank you for your feedback.
On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 23/2/2025 22:15, Alexander Korotkov wrote:
There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.I passed through the patch. It works correctly.
Multiple ifs in a single routine is not ideal.
I would say there is a brachcing anyway. Even if it is inside
adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
don't think it gets worse.I have thought about it
already, and it seems the remove_rel_from_query needs refactoring: when
I first reused it for self-join removal, we didn't have the 'ojrelid'
machinery, and it was implemented smoothly.
Right now, this code contains multiple places where we need to remove
the outer join relid and separating the removal code for the baserel and
outer join may simplify the logic and make it safer.
But the way to do it is not apparent now. May be if we implement a new
technique of query tree reduction, the approach will become more evident.Could you, please, elaborate more on what you mean by "new technique
of query tree reduction"?
I mean any transformations and optimisations that reduce search space
for optimisation. Right now, I see the features reduce_unique_semijoins,
remove_useless_joins, and remove_useless_self_joins.
In practice, I see at least a join on a foreign key, where some cases
potentially allow the removal of the JOIN operator.
--
regards, Andrei Lepikhov
On Mon, Feb 24, 2025 at 12:57 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 23/2/2025 22:15, Alexander Korotkov wrote:
There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.I passed through the patch. It works correctly.
Multiple ifs in a single routine is not ideal.
I would say there is a brachcing anyway. Even if it is inside
adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
don't think it gets worse.
I've added a commit message to the patch. I'm going to push it if no
objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Get-rid-of-ojrelid-local-variable-in-remove_rel_f.patchapplication/octet-stream; name=v2-0001-Get-rid-of-ojrelid-local-variable-in-remove_rel_f.patchDownload
From 48023c387ff8223f18d78e9c698a4d831a56fdfb Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sun, 23 Feb 2025 23:06:33 +0200
Subject: [PATCH v2] Get rid of ojrelid local variable in
remove_rel_from_query()
As spotted by Coverity, the calculation of ojrelid mixes signed and unsigned
types causes possible overflow and undefined behavior. Instead of trying to
fix the expression, this commit eliminates the relied local variable. The
explicit branching is used to replace the -1 value. That, in turn, requires
changing the signature of the remove_rel_from_eclass() function.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/914330.1740330169%40sss.pgh.pa.us
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
---
src/backend/optimizer/plan/analyzejoins.c | 81 +++++++++++++++--------
1 file changed, 53 insertions(+), 28 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 3aa04d0d4e1..b1e173c63bc 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -59,7 +59,8 @@ static void remove_leftjoinrel_from_query(PlannerInfo *root, int relid,
static void remove_rel_from_restrictinfo(RestrictInfo *rinfo,
int relid, int ojrelid);
static void remove_rel_from_eclass(EquivalenceClass *ec,
- int relid, int ojrelid, int subst);
+ SpecialJoinInfo *sjinfo,
+ int relid, int subst);
static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -324,7 +325,6 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
Relids joinrelids)
{
int relid = rel->relid;
- int ojrelid = (sjinfo != NULL) ? sjinfo->ojrelid : -1;
Index rti;
ListCell *l;
@@ -332,9 +332,15 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
* Update all_baserels and related relid sets.
*/
root->all_baserels = adjust_relid_set(root->all_baserels, relid, subst);
- root->outer_join_rels = adjust_relid_set(root->outer_join_rels, ojrelid, subst);
root->all_query_rels = adjust_relid_set(root->all_query_rels, relid, subst);
- root->all_query_rels = adjust_relid_set(root->all_query_rels, ojrelid, subst);
+
+ if (sjinfo != NULL)
+ {
+ root->outer_join_rels = bms_del_member(root->outer_join_rels,
+ sjinfo->ojrelid);
+ root->all_query_rels = bms_del_member(root->all_query_rels,
+ sjinfo->ojrelid);
+ }
/*
* Likewise remove references from SpecialJoinInfo data structures.
@@ -366,22 +372,30 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
if (sjinfo != NULL)
{
- Assert(subst <= 0 && ojrelid > 0);
-
- /* Remove ojrelid bits from the sets: */
- sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand, ojrelid);
- sjinf->min_righthand = bms_del_member(sjinf->min_righthand, ojrelid);
- sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, ojrelid);
- sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand, ojrelid);
+ Assert(subst <= 0);
+
+ /* Remove sjinfo->ojrelid bits from the sets: */
+ sjinf->min_lefthand = bms_del_member(sjinf->min_lefthand,
+ sjinfo->ojrelid);
+ sjinf->min_righthand = bms_del_member(sjinf->min_righthand,
+ sjinfo->ojrelid);
+ sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand,
+ sjinfo->ojrelid);
+ sjinf->syn_righthand = bms_del_member(sjinf->syn_righthand,
+ sjinfo->ojrelid);
/* relid cannot appear in these fields, but ojrelid can: */
- sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l, ojrelid);
- sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r, ojrelid);
- sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l, ojrelid);
- sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r, ojrelid);
+ sjinf->commute_above_l = bms_del_member(sjinf->commute_above_l,
+ sjinfo->ojrelid);
+ sjinf->commute_above_r = bms_del_member(sjinf->commute_above_r,
+ sjinfo->ojrelid);
+ sjinf->commute_below_l = bms_del_member(sjinf->commute_below_l,
+ sjinfo->ojrelid);
+ sjinf->commute_below_r = bms_del_member(sjinf->commute_below_r,
+ sjinfo->ojrelid);
}
else
{
- Assert(subst > 0 && ojrelid == -1);
+ Assert(subst > 0);
ChangeVarNodes((Node *) sjinf->semi_rhs_exprs, relid, subst, 0);
}
@@ -408,7 +422,7 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- (sjinfo == NULL || !bms_is_member(ojrelid, phinfo->ph_eval_at)))
+ (sjinfo == NULL || !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at)))
{
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
@@ -419,7 +433,9 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
PlaceHolderVar *phv = phinfo->ph_var;
phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, relid, subst);
- phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at, ojrelid, subst);
+ if (sjinfo != NULL)
+ phinfo->ph_eval_at = adjust_relid_set(phinfo->ph_eval_at,
+ sjinfo->ojrelid, subst);
Assert(!bms_is_empty(phinfo->ph_eval_at)); /* checked previously */
/* Reduce ph_needed to contain only "relation 0"; see below */
if (bms_is_member(0, phinfo->ph_needed))
@@ -437,7 +453,9 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
/* ph_lateral might or might not be empty */
phv->phrels = adjust_relid_set(phv->phrels, relid, subst);
- phv->phrels = adjust_relid_set(phv->phrels, ojrelid, subst);
+ if (sjinfo != NULL)
+ phv->phrels = adjust_relid_set(phv->phrels,
+ sjinfo->ojrelid, subst);
Assert(!bms_is_empty(phv->phrels));
ChangeVarNodes((Node *) phv->phexpr, relid, subst, 0);
@@ -454,8 +472,8 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
EquivalenceClass *ec = (EquivalenceClass *) lfirst(l);
if (bms_is_member(relid, ec->ec_relids) ||
- (sjinfo == NULL || bms_is_member(ojrelid, ec->ec_relids)))
- remove_rel_from_eclass(ec, relid, ojrelid, subst);
+ (sjinfo == NULL || bms_is_member(sjinfo->ojrelid, ec->ec_relids)))
+ remove_rel_from_eclass(ec, sjinfo, relid, subst);
}
/*
@@ -671,7 +689,8 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
}
/*
- * Remove any references to relid or ojrelid from the EquivalenceClass.
+ * Remove any references to relid or sjinfo->ojrelid (if sjinfo != NULL)
+ * from the EquivalenceClass.
*
* Like remove_rel_from_restrictinfo, we don't worry about cleaning out
* any nullingrel bits in contained Vars and PHVs. (This might have to be
@@ -680,13 +699,16 @@ remove_rel_from_restrictinfo(RestrictInfo *rinfo, int relid, int ojrelid)
* level(s).
*/
static void
-remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
+remove_rel_from_eclass(EquivalenceClass *ec, SpecialJoinInfo *sjinfo,
+ int relid, int subst)
{
ListCell *lc;
/* Fix up the EC's overall relids */
ec->ec_relids = adjust_relid_set(ec->ec_relids, relid, subst);
- ec->ec_relids = adjust_relid_set(ec->ec_relids, ojrelid, subst);
+ if (sjinfo != NULL)
+ ec->ec_relids = adjust_relid_set(ec->ec_relids,
+ sjinfo->ojrelid, subst);
/*
* Fix up the member expressions. Any non-const member that ends with
@@ -698,11 +720,14 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
if (bms_is_member(relid, cur_em->em_relids) ||
- (ojrelid != -1 && bms_is_member(ojrelid, cur_em->em_relids)))
+ (sjinfo != NULL && bms_is_member(sjinfo->ojrelid,
+ cur_em->em_relids)))
{
Assert(!cur_em->em_is_const);
cur_em->em_relids = adjust_relid_set(cur_em->em_relids, relid, subst);
- cur_em->em_relids = adjust_relid_set(cur_em->em_relids, ojrelid, subst);
+ if (sjinfo != NULL)
+ cur_em->em_relids = adjust_relid_set(cur_em->em_relids,
+ sjinfo->ojrelid, subst);
if (bms_is_empty(cur_em->em_relids))
ec->ec_members = foreach_delete_current(ec->ec_members, lc);
}
@@ -713,10 +738,10 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid, int subst)
{
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
- if (ojrelid == -1)
+ if (sjinfo == NULL)
ChangeVarNodes((Node *) rinfo, relid, subst, 0);
else
- remove_rel_from_restrictinfo(rinfo, relid, ojrelid);
+ remove_rel_from_restrictinfo(rinfo, relid, sjinfo->ojrelid);
}
/*
--
2.39.5 (Apple Git-154)
On Mon, Feb 24, 2025 at 2:22 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 24/2/2025 11:57, Alexander Korotkov wrote:
Hi, Andrei!
Thank you for your feedback.
On Mon, Feb 24, 2025 at 12:12 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 23/2/2025 22:15, Alexander Korotkov wrote:
There is my attempt to implement this approach. Getting rid of local
variable (and computation of the same value other way) required to
change arguments of remove_rel_from_eclass() as well. I'm going to
further polish this tomorrow.I passed through the patch. It works correctly.
Multiple ifs in a single routine is not ideal.
I would say there is a brachcing anyway. Even if it is inside
adjust_relid_set(). Patch makes it more explicit. Ideal or not, I
don't think it gets worse.I have thought about it
already, and it seems the remove_rel_from_query needs refactoring: when
I first reused it for self-join removal, we didn't have the 'ojrelid'
machinery, and it was implemented smoothly.
Right now, this code contains multiple places where we need to remove
the outer join relid and separating the removal code for the baserel and
outer join may simplify the logic and make it safer.
But the way to do it is not apparent now. May be if we implement a new
technique of query tree reduction, the approach will become more evident.Could you, please, elaborate more on what you mean by "new technique
of query tree reduction"?I mean any transformations and optimisations that reduce search space
for optimisation. Right now, I see the features reduce_unique_semijoins,
remove_useless_joins, and remove_useless_self_joins.
In practice, I see at least a join on a foreign key, where some cases
potentially allow the removal of the JOIN operator.
Do you mean some generic facility, which generalizes all the
transformations you mentioned? If so, it would be cool. But how
could it look like?
------
Regards,
Alexander Korotkov
Supabase
On 26/2/2025 13:14, Alexander Korotkov wrote:
On Mon, Feb 24, 2025 at 2:22 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 24/2/2025 11:57, Alexander Korotkov wrote:
Could you, please, elaborate more on what you mean by "new technique
of query tree reduction"?I mean any transformations and optimisations that reduce search space
for optimisation. Right now, I see the features reduce_unique_semijoins,
remove_useless_joins, and remove_useless_self_joins.
In practice, I see at least a join on a foreign key, where some cases
potentially allow the removal of the JOIN operator.Do you mean some generic facility, which generalizes all the
transformations you mentioned? If so, it would be cool. But how
could it look like?
I think we may realise what it may look like by attempting to implement
more RelOptInfo-removal features. Right now, I have only vague ideas on
that subject. Just for reference, there were other discussions on join
removal [1 - 5] and some blog posts explaining Oracle techniques in this
area - see, for example, [6]Oracle Join Elimination https://oracle-base.com/articles/misc/join-elimination.
[1]: inner join removal /messages/by-id/AANLkTinS_MlZ2F3Siwgcje--qf5nTJTpuHFnZdcA45bU@mail.gmail.com
/messages/by-id/AANLkTinS_MlZ2F3Siwgcje--qf5nTJTpuHFnZdcA45bU@mail.gmail.com
[2]: Patch to support SEMI and ANTI join removal /messages/by-id/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com
/messages/by-id/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com
[3]: Removing INNER JOINs /messages/by-id/CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com
/messages/by-id/CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com
[4]: WIP Join Removal /messages/by-id/1220176372.4371.118.camel@ebony.2ndQuadrant
/messages/by-id/1220176372.4371.118.camel@ebony.2ndQuadrant
[5]: Join Removal/ Vertical Partitioning /messages/by-id/1214477827.3845.87.camel@ebony.site
/messages/by-id/1214477827.3845.87.camel@ebony.site
[6]: Oracle Join Elimination https://oracle-base.com/articles/misc/join-elimination
https://oracle-base.com/articles/misc/join-elimination
--
regards, Andrei Lepikhov
Hi.
Em sáb., 15 de fev. de 2025 às 06:20, Alexander Korotkov <
aekorotkov@gmail.com> escreveu:
On Thu, Feb 13, 2025 at 1:00 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:On Tue, Feb 11, 2025 at 5:31 PM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:Hi! Thank you for the work with this subject, I think it is really
important.On 10.02.2025 22:58, Alexander Korotkov wrote:
Hi!
On Mon, Feb 10, 2025 at 7:19 AM Andrei Lepikhov <lepihov@gmail.com>
wrote:
On 9/2/2025 18:41, Alexander Korotkov wrote:
Regarding adjust_relid_set() and replace_relid(). I think they are
now strictly equivalent, except for the case then old relid isgiven
and not found. In this case adjust_relid_set() returns the
original
relids while replace_relid() returns a copy. The behavior of
adjust_relid_set() appears more desirable as we don't need extra
copying when no modification is done. So, I've replaced all
replace_relid() with adjust_relid_set().Ok, I glanced into it, and it makes sense to merge these routines.
I think the comment to adjust_relid_set() should be arranged, too.See
the attachment for a short variant of such modification.
Also, I did some grammar correction to your new comment in tests.
Thanks!
I've further revised adjust_relid_set() header comment.
Looking back to the work done since previous attempt to commit this
to
pg17, I can highlight following.
1) We're now using more of existing infrastructure including
adjust_relid_set() and ChangeVarNodes(). The most of complexity is
still there though.
2) We've checked few ways to further simplify this patch. But yetthe
current way still feels to be best possible.
3) For sure, several bugs were fixed.I think we could give it another chance for pg18 after some further
polishing (at least commit message still needs to be revised). Any
thoughts on this? Tom?I didn't find any mistakes, I just have a refactoring remark. I think
the part where we add non-redundant expressions with the
binfo_candidates, jinfo_candidates
check can be moved to a separate function, otherwise the code is very
repetitive in this place. I did it and attached diff fileThank you. I've integrated that into a patch. However, I've to
change keep_rinfo_list to be passed by pointer to
add_non_redundant_clauses(), because it might be changed in
distribute_restrictinfo_to_rels(). Without that there is a case of
duplicated clause in regression tests.I've changed 'inner' and 'outer' vise versa in
remove_self_joins_one_group() for better readability (I believe that
was discussed upthread but lost). Also, I did a round of improvement
for comments and commit message.I've corrected some spelling error reported by Alexander Lakhin
privately to me. Also, I've revised comments around ChangeVarNodes()
and ChangeVarNodesExtended(). I'm going to continue nitpicking this
patch during next couple days then push it if no objections.
I think a small optimization is possible here.
The whole block that append *rinfo* to exprs is controlled by extra_clause
not NULL.
So It's worth moving the test to the beginning of the block and avoiding it
altogether if that's the case.
trivial patch attached.
best regards,
Ranier Vilela
Attachments:
avoid-expensive-function-indxpath.patchapplication/octet-stream; name=avoid-expensive-function-indxpath.patchDownload
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index a43ca16d68..c983753f84 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -4286,7 +4286,8 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
{
matched = true; /* column is unique */
- if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+ if (extra_clauses &&
+ bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
{
MemoryContext oldMemCtx =
MemoryContextSwitchTo(root->planner_cxt);
@@ -4298,8 +4299,8 @@ relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
*/
Assert(bms_is_empty(rinfo->left_relids) ||
bms_is_empty(rinfo->right_relids));
- if (extra_clauses)
- exprs = lappend(exprs, rinfo);
+
+ exprs = lappend(exprs, rinfo);
MemoryContextSwitchTo(oldMemCtx);
}
Hi all,
I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v1-0001-Disallow-removing-placeholders-during-Self-Join-E.patchapplication/octet-stream; name=v1-0001-Disallow-removing-placeholders-during-Self-Join-E.patchDownload
From e155fce0fbad1e56404a813151195f55d7cbd997 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 2 Apr 2025 13:10:21 +0300
Subject: [PATCH v1] Disallow removing placeholders during Self-Join
Elimination.
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
elations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation - thanks to the b3ff6c742f6c for
detecting that.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Andrei Lepikhov <lepihov@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
src/backend/optimizer/plan/analyzejoins.c | 12 +++++++--
src/test/regress/expected/join.out | 31 +++++++++++++++++++++++
src/test/regress/sql/join.sql | 20 +++++++++++++++
3 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 8a8d4a2af33..492ac17b928 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -420,10 +420,18 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (sjinfo != NULL &&
+ bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- (sjinfo == NULL || !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at)))
+ !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at))
{
+ /*
+ * This code shouldn't be executed if one relation is substituted
+ * with another: in this case, the placeholder may be employed in
+ * a filter inside the scan node the SJE removes.
+ */
+ Assert(subst < 0);
+
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
root->placeholder_array[phinfo->phid] = NULL;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 14da5708451..d3232c75f2e 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7200,6 +7200,37 @@ on true;
Output: t3.id, t1.id
(7 rows)
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params code will detect if the placeholder list doesn't have
+-- a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x=t1.x % 2;
+ QUERY PLAN
+---------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.tbl_phv t1
+ Output: t1.x, t1.y
+ -> Index Scan using tbl_phv_idx on public.tbl_phv tr
+ Output: tr.x, tr.y
+ Index Cond: (tr.x = (t1.x % 2))
+ Filter: (1 IS NOT NULL)
+(8 rows)
+
+DROP TABLE IF EXISTS tbl_phv;
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c29d13b9fed..a9cdd4fe7b9 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2774,6 +2774,26 @@ select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params code will detect if the placeholder list doesn't have
+-- a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x=t1.x % 2;
+DROP TABLE IF EXISTS tbl_phv;
+
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
--
2.39.5 (Apple Git-154)
On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.
Hmm, I'm not sure about the fix. It seems to me that it simply
prevents removing any PHVs in the self-join removal case. My concern
is that this might result in PHVs that could actually be removed not
being removed in many cases.
Besides, there's the specific comment above this code explaining the
logic behind the removal of PHVs. Shouldn't that comment be updated
to reflect the changes?
Thanks
Richard
On 4/4/25 04:53, Richard Guo wrote:
On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.Hmm, I'm not sure about the fix. It seems to me that it simply
prevents removing any PHVs in the self-join removal case. My concern
is that this might result in PHVs that could actually be removed not
being removed in many cases.
Let's play with use cases:
If a PHV is needed in the inner or outer only, it means we have a clause
in the baserestrictinfo that will be transferred to the keeping
relation, and we shouldn't remove the PHV.
Another case is when the PHV is needed in a join clause of the
self-join. I may imagine such a case:
toKeep.x+toRemove.y=PHV
This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
baserestrictinfo of keeping relation and should be saved.
I think it is possible to invent quite a narrow case of clause like the
following:
PHV_evaluated_at_inner = PHV_evaluated_at_outer
It needs to prove reproducibility. But even if it makes sense, it seems
to have no danger for further selectivity estimation compared to the
source clause and is a too-narrow case, isn't it?
In other cases, this PHV is needed something else, and we can't remove it.
Maybe I lost the case you keep in mind? I would like to discover it.
Besides, there's the specific comment above this code explaining the
logic behind the removal of PHVs. Shouldn't that comment be updated
to reflect the changes?
It makes sense: for now, it seems that PHV removal should be used in the
case of an outer join removal. In the case of SJE, logically we make a
replacement, not a removal, and we should not reduce the number of
entities involved.
--
regards, Andrei Lepikhov
On Fri, Apr 4, 2025 at 11:35 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 4/4/25 04:53, Richard Guo wrote:
On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.Hmm, I'm not sure about the fix. It seems to me that it simply
prevents removing any PHVs in the self-join removal case. My concern
is that this might result in PHVs that could actually be removed not
being removed in many cases.Let's play with use cases:
If a PHV is needed in the inner or outer only, it means we have a clause
in the baserestrictinfo that will be transferred to the keeping
relation, and we shouldn't remove the PHV.
Another case is when the PHV is needed in a join clause of the
self-join. I may imagine such a case:toKeep.x+toRemove.y=PHV
This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
baserestrictinfo of keeping relation and should be saved.
I think it is possible to invent quite a narrow case of clause like the
following:PHV_evaluated_at_inner = PHV_evaluated_at_outer
It needs to prove reproducibility. But even if it makes sense, it seems
to have no danger for further selectivity estimation compared to the
source clause and is a too-narrow case, isn't it?
In other cases, this PHV is needed something else, and we can't remove it.
Should we add more regression tests covering these cases?
Besides, there's the specific comment above this code explaining the
logic behind the removal of PHVs. Shouldn't that comment be updated
to reflect the changes?It makes sense: for now, it seems that PHV removal should be used in the
case of an outer join removal. In the case of SJE, logically we make a
replacement, not a removal, and we should not reduce the number of
entities involved.
I have added a brief comment about that. Check the attached patch.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Disallow-removing-placeholders-during-Self-Join-E.patchapplication/octet-stream; name=v2-0001-Disallow-removing-placeholders-during-Self-Join-E.patchDownload
From f2f020b97a4065e6c2da148e26e089bea86b77e4 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 2 Apr 2025 13:10:21 +0300
Subject: [PATCH v2] Disallow removing placeholders during Self-Join
Elimination.
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
elations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation - thanks to the b3ff6c742f6c for
detecting that.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
src/backend/optimizer/plan/analyzejoins.c | 16 +++++++++---
src/test/regress/expected/join.out | 31 +++++++++++++++++++++++
src/test/regress/sql/join.sql | 20 +++++++++++++++
3 files changed, 64 insertions(+), 3 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index ae20691ca91..afd58728e33 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -403,7 +403,9 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
/*
* Likewise remove references from PlaceHolderVar data structures,
- * removing any no-longer-needed placeholders entirely.
+ * removing any no-longer-needed placeholders entirely. We remove PHV
+ * only for left-join removal. With self-join elimination, PHVs get
+ * moved to the remaining rel, where it might still be needed.
*
* Removal is a bit trickier than it might seem: we can remove PHVs that
* are used at the target rel and/or in the join qual, but not those that
@@ -420,10 +422,18 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (sjinfo != NULL &&
+ bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- (sjinfo == NULL || !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at)))
+ !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at))
{
+ /*
+ * This code shouldn't be executed if one relation is substituted
+ * with another: in this case, the placeholder may be employed in
+ * a filter inside the scan node the SJE removes.
+ */
+ Assert(subst < 0);
+
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
root->placeholder_array[phinfo->phid] = NULL;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 14da5708451..0814b4b80cb 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7200,6 +7200,37 @@ on true;
Output: t3.id, t1.id
(7 rows)
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params() code will detect if the placeholder list doesn't
+-- have a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x = t1.x % 2;
+ QUERY PLAN
+---------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.tbl_phv t1
+ Output: t1.x, t1.y
+ -> Index Scan using tbl_phv_idx on public.tbl_phv tr
+ Output: tr.x, tr.y
+ Index Cond: (tr.x = (t1.x % 2))
+ Filter: (1 IS NOT NULL)
+(8 rows)
+
+DROP TABLE IF EXISTS tbl_phv;
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c29d13b9fed..d7d76fec57e 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2774,6 +2774,26 @@ select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params() code will detect if the placeholder list doesn't
+-- have a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x = t1.x % 2;
+DROP TABLE IF EXISTS tbl_phv;
+
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
--
2.39.5 (Apple Git-154)
On 05.04.2025 13:09, Alexander Korotkov wrote:
On Fri, Apr 4, 2025 at 11:35 AM Andrei Lepikhov<lepihov@gmail.com> wrote:
On 4/4/25 04:53, Richard Guo wrote:
On Fri, Apr 4, 2025 at 1:02 AM Alexander Korotkov<aekorotkov@gmail.com> wrote:
I've got an off-list bug report from Alexander Lakhin involving a
placeholder variable. Alena and Andrei proposed a fix. It is fairly
simple: we just shouldn't remove PHVs during self-join elimination, as
they might still be referenced from other parts of a query. The patch
is attached. I'm going to fix this if no objections.Hmm, I'm not sure about the fix. It seems to me that it simply
prevents removing any PHVs in the self-join removal case. My concern
is that this might result in PHVs that could actually be removed not
being removed in many cases.Let's play with use cases:
If a PHV is needed in the inner or outer only, it means we have a clause
in the baserestrictinfo that will be transferred to the keeping
relation, and we shouldn't remove the PHV.
Another case is when the PHV is needed in a join clause of the
self-join. I may imagine such a case:toKeep.x+toRemove.y=PHV
This clause will be transformed to "toKeep.x+toKeep.y=PHV", pushed to
baserestrictinfo of keeping relation and should be saved.
I think it is possible to invent quite a narrow case of clause like the
following:PHV_evaluated_at_inner = PHV_evaluated_at_outer
It needs to prove reproducibility. But even if it makes sense, it seems
to have no danger for further selectivity estimation compared to the
source clause and is a too-narrow case, isn't it?
In other cases, this PHV is needed something else, and we can't remove it.
I agree with that, and I also believe this case will be quite rare in
practice.
Should we add more regression tests covering these cases?
I experimented with some examples like this and noticed that it does
affect cardinality estimation, though I'm not sure the impact is
significant.
I used the tables from the regression tests, so if they’re appropriate
for reproducing this case, it should be straightforward to add them.
EXPLAIN (analyze, VERBOSE)
SELECT 1
FROM tbl_phv t1
LEFT JOIN (
SELECT 1 AS extra, x, y FROM tbl_phv tl
) t3
JOIN (
SELECT y, x FROM tbl_phv tr
) t4 ON t4.y = t3.y and (t4.x*0) = (t3.x *0)
ON true
WHERE t3.extra IS NOT NULL
AND t4.x + t3.y= (t1.x * 0);
The query plan with transformation:
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.27..7.16 rows=100 width=4) (actual
time=0.149..0.151 rows=0.00 loops=1)
Output: 1
Hash Cond: ((tr.x + tr.y) = (t1.x * 0))
Buffers: shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.26 rows=100
width=8) (actual time=0.027..0.047 rows=100.00 loops=1)
Output: tr.x, tr.y
Filter: ((1 IS NOT NULL) AND ((tr.x * 0) IS NOT NULL))
Rows Removed by Filter: 1
Buffers: shared hit=1
-> Hash (cost=2.01..2.01 rows=101 width=4) (actual
time=0.078..0.079 rows=100.00 loops=1)
Output: t1.x
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=1
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101
width=4) (actual time=0.005..0.038 rows=101.00 loops=1)
Output: t1.x
Buffers: shared hit=1
Planning Time: 0.607 ms
Execution Time: 0.194 ms
(18 rows)
The query plan without transformation:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.09..8.74 rows=1 width=4) (actual time=0.248..0.250
rows=0.00 loops=1)
Output:1
Hash Cond:((t1.x * 0) = (tr.x + tl.y))
Buffers:shared hit=3
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101 width=4)
(actual time=0.011..0.021 rows=101.00 loops=1)
Output:t1.x, t1.y
Buffers:shared hit=1
-> Hash (cost=6.08..6.08 rows=1 width=8) (actual time=0.206..0.207
rows=100.00 loops=1)
Output:tl.y, tr.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=2
-> Hash Join (cost=3.51..6.08 rows=1 width=8) (actual time=0.090..0.160
rows=100.00 loops=1)
Output:tl.y, tr.x
Inner Unique:true
Hash Cond:((tr.y = tl.y) AND ((tr.x * 0) = (tl.x * 0)))
Buffers:shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=101 width=8)
(actual time=0.005..0.016 rows=101.00 loops=1)
Output:tr.x, tr.y
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=8) (actual time=0.080..0.080
rows=100.00 loops=1)
Output:tl.y, tl.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv tl (cost=0.00..2.01 rows=100 width=8)
(actual time=0.008..0.035 rows=101.00 loops=1)
Output:tl.y, tl.x
Filter:(1 IS NOT NULL)
Buffers:shared hit=1
Planning:
Buffers:shared hit=25
Planning Time:0.609 ms
Execution Time:0.319 ms
(31 rows)
EXPLAIN (analyze, VERBOSE)
SELECT 1
FROM tbl_phv t1
LEFT JOIN (
SELECT 1 AS extra, x, y FROM tbl_phv tl
) t3
JOIN (
SELECT y, x FROM tbl_phv tr
) t4 ON t4.y = t3.y
ON true
WHERE t3.extra IS NOT NULL
AND t4.x + t3.y= (t1.x * 0);
The query plan with transformation:
------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3.26..6.90 rows=100 width=4) (actual
time=0.145..0.146 rows=0.00 loops=1)
Output: 1
Hash Cond: ((t1.x * 0) = (tr.x + tr.y))
Buffers: shared hit=2
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101
width=4) (actual time=0.030..0.040 rows=101.00 loops=1)
Output: t1.x, t1.y
Buffers: shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=8) (actual
time=0.081..0.081 rows=100.00 loops=1)
Output: tr.y, tr.x
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=1
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=100
width=8) (actual time=0.012..0.040 rows=101.00 loops=1)
Output: tr.y, tr.x
Filter: (1 IS NOT NULL)
Buffers: shared hit=1
Planning Time: 0.571 ms
Execution Time: 0.195 ms
(17 rows)
The query plan without transformation:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.53..9.94 rows=50 width=4) (actual time=0.236..0.238
rows=0.00 loops=1)
Output:1
Hash Cond:((tr.x + tl.y) = (t1.x * 0))
Buffers:shared hit=3
-> Hash Join (cost=3.26..5.55 rows=100 width=8) (actual
time=0.075..0.137 rows=101.00 loops=1)
Output:tl.y, tr.x
Inner Unique:true
Hash Cond:(tr.y = tl.y)
Buffers:shared hit=2
-> Seq Scan on public.tbl_phv tr (cost=0.00..2.01 rows=101 width=8)
(actual time=0.005..0.017 rows=101.00 loops=1)
Output:tr.x, tr.y
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=100 width=4) (actual time=0.064..0.065
rows=101.00 loops=1)
Output:tl.y
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv tl (cost=0.00..2.01 rows=100 width=4)
(actual time=0.006..0.033 rows=101.00 loops=1)
Output:tl.y
Filter:(1 IS NOT NULL)
Buffers:shared hit=1
-> Hash (cost=2.01..2.01 rows=101 width=4) (actual time=0.078..0.078
rows=100.00 loops=1)
Output:t1.x
Buckets:1024 Batches:1 Memory Usage:12kB
Buffers:shared hit=1
-> Seq Scan on public.tbl_phv t1 (cost=0.00..2.01 rows=101 width=4)
(actual time=0.014..0.039 rows=101.00 loops=1)
Output:t1.x
Buffers:shared hit=1
Planning:
Buffers:shared hit=12
Planning Time:0.478 ms
Execution Time:0.293 ms
(31 rows)
--
Regards,
Alena Rybakina
Postgres Professional
Hi,
I find that the postgresql.conf.sample file doesn't contain
enable_self_join_elimination guc.
If this is necessary, please see the attached patch.
--
Thanks, Tender Wang
Attachments:
0001-Put-enable_self_join_elimination-into-postgresql.con.patchtext/plain; charset=US-ASCII; name=0001-Put-enable_self_join_elimination-into-postgresql.con.patchDownload
From f27c99aebbd07d4008173492c7913749b149b540 Mon Sep 17 00:00:00 2001
From: Tender Wang <tndrwang@gmail.com>
Date: Sun, 6 Apr 2025 11:54:49 +0800
Subject: [PATCH] Put enable_self_join_elimination into postgresql.conf.sample
---
src/backend/utils/misc/postgresql.conf.sample | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ff56a1f0732..bcd4e67f43e 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -427,6 +427,7 @@
#enable_tidscan = on
#enable_group_by_reordering = on
#enable_distinct_reordering = on
+#enable_self_join_elimination = on
# - Planner Cost Constants -
--
2.34.1
On 6 Apr 2025, at 06:02, Tender Wang <tndrwang@gmail.com> wrote:
I find that the postgresql.conf.sample file doesn't contain enable_self_join_elimination guc.
If this is necessary, please see the attached patch.
The GUC is marked as not supposed by the in the sample file, either it really
shouldn't or that needs to be removed from the GUC declaration.
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> 于2025年4月6日周日 19:23写道:
On 6 Apr 2025, at 06:02, Tender Wang <tndrwang@gmail.com> wrote:
I find that the postgresql.conf.sample file doesn't contain
enable_self_join_elimination guc.
If this is necessary, please see the attached patch.
The GUC is marked as not supposed by the in the sample file, either it
really
shouldn't or that needs to be removed from the GUC declaration.
Sorry, I didn't see the GUC_NOT_IN_SAMPLE flag. Thanks for the reminder.
Please ignore the noise.
--
Thanks, Tender Wang
On Sun, Apr 6, 2025 at 2:42 PM Tender Wang <tndrwang@gmail.com> wrote:
Daniel Gustafsson <daniel@yesql.se> 于2025年4月6日周日 19:23写道:
On 6 Apr 2025, at 06:02, Tender Wang <tndrwang@gmail.com> wrote:
I find that the postgresql.conf.sample file doesn't contain enable_self_join_elimination guc.
If this is necessary, please see the attached patch.The GUC is marked as not supposed by the in the sample file, either it really
shouldn't or that needs to be removed from the GUC declaration.Sorry, I didn't see the GUC_NOT_IN_SAMPLE flag. Thanks for the reminder.
Please ignore the noise.
Also, sorry for thoughtless commit of that.
Nevertheless, should we consider revisiting this flag? I see the only
other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
optimize_bounded_sort, which is not exposed in a standard build.
------
Regards,
Alexander Korotkov
Supabase
Alexander Korotkov <aekorotkov@gmail.com> 于2025年4月6日周日 19:50写道:
On Sun, Apr 6, 2025 at 2:42 PM Tender Wang <tndrwang@gmail.com> wrote:
Daniel Gustafsson <daniel@yesql.se> 于2025年4月6日周日 19:23写道:
On 6 Apr 2025, at 06:02, Tender Wang <tndrwang@gmail.com> wrote:
I find that the postgresql.conf.sample file doesn't contain
enable_self_join_elimination guc.
If this is necessary, please see the attached patch.
The GUC is marked as not supposed by the in the sample file, either it
really
shouldn't or that needs to be removed from the GUC declaration.
Sorry, I didn't see the GUC_NOT_IN_SAMPLE flag. Thanks for the reminder.
Please ignore the noise.Also, sorry for thoughtless commit of that.
Nevertheless, should we consider revisiting this flag? I see the only
other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
optimize_bounded_sort, which is not exposed in a standard build.
Agree.
Putting enable_self_join_elimination into sample file seems a good choice.
--
Thanks, Tender Wang
Alexander Korotkov <aekorotkov@gmail.com> writes:
Nevertheless, should we consider revisiting this flag? I see the only
other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
optimize_bounded_sort, which is not exposed in a standard build.
enable_self_join_elimination is documented, and it has no hint that
it is any different from any other enable_foo flag. It shouldn't
be different.
regards, tom lane
On Sun, Apr 6, 2025 at 5:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
Nevertheless, should we consider revisiting this flag? I see the only
other GUC simultaneously QUERY_TUNING_METHOD and GUC_NOT_IN_SAMPLE is
optimize_bounded_sort, which is not exposed in a standard build.enable_self_join_elimination is documented, and it has no hint that
it is any different from any other enable_foo flag. It shouldn't
be different.
Thank you for your point, Tom. Will do this later today.
------
Regards,
Alexander Korotkov
Supabase
FWIW, I reported some issues with this commit in [1]/messages/by-id/CAMbWs49PE3CvnV8vrQ0Dr=HqgZZmX0tdNbzVNJxqc8yg-8kDQQ@mail.gmail.com. Any thoughts on
how to fix them?
[1]: /messages/by-id/CAMbWs49PE3CvnV8vrQ0Dr=HqgZZmX0tdNbzVNJxqc8yg-8kDQQ@mail.gmail.com
Thanks
Richard
Hi, Richard!
On Mon, Apr 7, 2025 at 6:12 AM Richard Guo <guofenglinux@gmail.com> wrote:
FWIW, I reported some issues with this commit in [1]. Any thoughts on
how to fix them?[1] /messages/by-id/CAMbWs49PE3CvnV8vrQ0Dr=HqgZZmX0tdNbzVNJxqc8yg-8kDQQ@mail.gmail.com
Thank you for pointing. Looking into that!
------
Regards,
Alexander Korotkov
Supabase
Hi, Alena!
On Sun, Apr 6, 2025 at 12:02 AM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:
Should we add more regression tests covering these cases?
I experimented with some examples like this and noticed that it does affect cardinality estimation, though I'm not sure the impact is significant.
I used the tables from the regression tests, so if they’re appropriate for reproducing this case, it should be straightforward to add them.
Thank you for your feedback. I've check the cases you've provided. I
found that the differences here are related to the SJE itself, not to
changes regarding PHVs handling. I think it generally OK that
estimates are somewhat changed due to such significant query
transformation. Hopefully they should be improved in the majority of
cases.
I did some improvements to PHVs patch: revised comments and commit
message. I'm going to push it if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Disallow-removing-placeholders-during-Self-Join-E.patchapplication/octet-stream; name=v2-0001-Disallow-removing-placeholders-during-Self-Join-E.patchDownload
From f2fc6b6513f43dc11e46e522b0453457ff01c32d Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 2 Apr 2025 13:10:21 +0300
Subject: [PATCH v2] Disallow removing placeholders during Self-Join
Elimination.
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
elations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation - thanks to the b3ff6c742f6c for
detecting that.
It might also happen that we skip the removal of some PHVs that could be
removed. However, the overhead of extra PHVs is small compared to the
complexity of analysis needed to remove them.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
src/backend/optimizer/plan/analyzejoins.c | 17 ++++++++++--
src/test/regress/expected/join.out | 34 ++++++++++++++++++++++-
src/test/regress/sql/join.sql | 23 ++++++++++++++-
3 files changed, 69 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6b58567f511..24356450e01 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -403,7 +403,12 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
/*
* Likewise remove references from PlaceHolderVar data structures,
- * removing any no-longer-needed placeholders entirely.
+ * removing any no-longer-needed placeholders entirely. We remove PHV
+ * only for left-join removal. With self-join elimination, PHVs already
+ * get moved to the remaining relation, where they might still be needed.
+ * It might also happen that we skip the removal of some PHVs that could
+ * be removed. However, the overhead of extra PHVs is small compared to
+ * the complexity of analysis needed to remove them.
*
* Removal is a bit trickier than it might seem: we can remove PHVs that
* are used at the target rel and/or in the join qual, but not those that
@@ -420,10 +425,16 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (sjinfo != NULL &&
+ bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- (sjinfo == NULL || !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at)))
+ !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at))
{
+ /*
+ * This code shouldn't be executed if one relation is substituted
+ * with another: in this case, the placeholder may be employed in
+ * a filter inside the scan node the SJE removes.
+ */
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
root->placeholder_array[phinfo->phid] = NULL;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 14da5708451..306fb8cbe79 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7150,7 +7150,8 @@ on true;
-> Seq Scan on emp1 t4
(7 rows)
--- Check that SJE removes the whole PHVs correctly
+-- Try PHV, which could potentially be removed completely by SJE, but that's
+-- not implemented yet.
explain (verbose, costs off)
select 1 from emp1 t1 left join
((select 1 as x, * from emp1 t2) s1 inner join
@@ -7200,6 +7201,37 @@ on true;
Output: t3.id, t1.id
(7 rows)
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params() code will detect if the placeholder list doesn't
+-- have a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x = t1.x % 2;
+ QUERY PLAN
+---------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.tbl_phv t1
+ Output: t1.x, t1.y
+ -> Index Scan using tbl_phv_idx on public.tbl_phv tr
+ Output: tr.x, tr.y
+ Index Cond: (tr.x = (t1.x % 2))
+ Filter: (1 IS NOT NULL)
+(8 rows)
+
+DROP TABLE IF EXISTS tbl_phv;
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c29d13b9fed..85af1078659 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2756,7 +2756,8 @@ select * from emp1 t1 left join
on true)
on true;
--- Check that SJE removes the whole PHVs correctly
+-- Try PHV, which could potentially be removed completely by SJE, but that's
+-- not implemented yet.
explain (verbose, costs off)
select 1 from emp1 t1 left join
((select 1 as x, * from emp1 t2) s1 inner join
@@ -2774,6 +2775,26 @@ select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params() code will detect if the placeholder list doesn't
+-- have a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x = t1.x % 2;
+DROP TABLE IF EXISTS tbl_phv;
+
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
--
2.39.5 (Apple Git-154)
On Sat, Apr 26, 2025 at 11:04 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Sun, Apr 6, 2025 at 12:02 AM Alena Rybakina
<a.rybakina@postgrespro.ru> wrote:Should we add more regression tests covering these cases?
I experimented with some examples like this and noticed that it does affect cardinality estimation, though I'm not sure the impact is significant.
I used the tables from the regression tests, so if they’re appropriate for reproducing this case, it should be straightforward to add them.Thank you for your feedback. I've check the cases you've provided. I
found that the differences here are related to the SJE itself, not to
changes regarding PHVs handling. I think it generally OK that
estimates are somewhat changed due to such significant query
transformation. Hopefully they should be improved in the majority of
cases.I did some improvements to PHVs patch: revised comments and commit
message. I'm going to push it if no objections.
Uh, v2 was there already. That should be v3.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v3-0001-Disallow-removing-placeholders-during-Self-Join-E.patchapplication/octet-stream; name=v3-0001-Disallow-removing-placeholders-during-Self-Join-E.patchDownload
From f2fc6b6513f43dc11e46e522b0453457ff01c32d Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 2 Apr 2025 13:10:21 +0300
Subject: [PATCH v3] Disallow removing placeholders during Self-Join
Elimination.
fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base
elations when appropriate. However, regressions tests for SJE only cover
the case when placeholder variables (PHVs) are evaluated and needed only
in a single base rel. If this baserel is removed due to SJE, its clauses,
including PHVs, will be transferred to the keeping relation. Removing these
PHVs may trigger an error on plan creation - thanks to the b3ff6c742f6c for
detecting that.
It might also happen that we skip the removal of some PHVs that could be
removed. However, the overhead of extra PHVs is small compared to the
complexity of analysis needed to remove them.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
---
src/backend/optimizer/plan/analyzejoins.c | 17 ++++++++++--
src/test/regress/expected/join.out | 34 ++++++++++++++++++++++-
src/test/regress/sql/join.sql | 23 ++++++++++++++-
3 files changed, 69 insertions(+), 5 deletions(-)
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6b58567f511..24356450e01 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -403,7 +403,12 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
/*
* Likewise remove references from PlaceHolderVar data structures,
- * removing any no-longer-needed placeholders entirely.
+ * removing any no-longer-needed placeholders entirely. We remove PHV
+ * only for left-join removal. With self-join elimination, PHVs already
+ * get moved to the remaining relation, where they might still be needed.
+ * It might also happen that we skip the removal of some PHVs that could
+ * be removed. However, the overhead of extra PHVs is small compared to
+ * the complexity of analysis needed to remove them.
*
* Removal is a bit trickier than it might seem: we can remove PHVs that
* are used at the target rel and/or in the join qual, but not those that
@@ -420,10 +425,16 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Assert(sjinfo == NULL || !bms_is_member(relid, phinfo->ph_lateral));
- if (bms_is_subset(phinfo->ph_needed, joinrelids) &&
+ if (sjinfo != NULL &&
+ bms_is_subset(phinfo->ph_needed, joinrelids) &&
bms_is_member(relid, phinfo->ph_eval_at) &&
- (sjinfo == NULL || !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at)))
+ !bms_is_member(sjinfo->ojrelid, phinfo->ph_eval_at))
{
+ /*
+ * This code shouldn't be executed if one relation is substituted
+ * with another: in this case, the placeholder may be employed in
+ * a filter inside the scan node the SJE removes.
+ */
root->placeholder_list = foreach_delete_current(root->placeholder_list,
l);
root->placeholder_array[phinfo->phid] = NULL;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 14da5708451..306fb8cbe79 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7150,7 +7150,8 @@ on true;
-> Seq Scan on emp1 t4
(7 rows)
--- Check that SJE removes the whole PHVs correctly
+-- Try PHV, which could potentially be removed completely by SJE, but that's
+-- not implemented yet.
explain (verbose, costs off)
select 1 from emp1 t1 left join
((select 1 as x, * from emp1 t2) s1 inner join
@@ -7200,6 +7201,37 @@ on true;
Output: t3.id, t1.id
(7 rows)
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params() code will detect if the placeholder list doesn't
+-- have a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x = t1.x % 2;
+ QUERY PLAN
+---------------------------------------------------------
+ Nested Loop
+ Output: 1
+ -> Seq Scan on public.tbl_phv t1
+ Output: t1.x, t1.y
+ -> Index Scan using tbl_phv_idx on public.tbl_phv tr
+ Output: tr.x, tr.y
+ Index Cond: (tr.x = (t1.x % 2))
+ Filter: (1 IS NOT NULL)
+(8 rows)
+
+DROP TABLE IF EXISTS tbl_phv;
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c29d13b9fed..85af1078659 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2756,7 +2756,8 @@ select * from emp1 t1 left join
on true)
on true;
--- Check that SJE removes the whole PHVs correctly
+-- Try PHV, which could potentially be removed completely by SJE, but that's
+-- not implemented yet.
explain (verbose, costs off)
select 1 from emp1 t1 left join
((select 1 as x, * from emp1 t2) s1 inner join
@@ -2774,6 +2775,26 @@ select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
+-- This is a degenerate case of PHV usage: it is evaluated and needed inside
+-- a baserel scan operation that the SJE removes.
+-- The PHV in this test should be in the filter of parameterised Index Scan:
+-- replace_nestloop_params() code will detect if the placeholder list doesn't
+-- have a reference to this parameter.
+--
+-- NOTE: enable_hashjoin and enable_mergejoin must be disabled.
+CREATE TABLE tbl_phv(x int, y int PRIMARY KEY);
+CREATE INDEX tbl_phv_idx ON tbl_phv(x);
+INSERT INTO tbl_phv (x, y)
+ SELECT gs, gs FROM generate_series(1,100) AS gs;
+VACUUM ANALYZE tbl_phv;
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT 1 FROM tbl_phv t1 LEFT JOIN
+ (SELECT 1 extra, x, y FROM tbl_phv tl) t3 JOIN
+ (SELECT y FROM tbl_phv tr) t4
+ ON t4.y = t3.y
+ON true WHERE t3.extra IS NOT NULL AND t3.x = t1.x % 2;
+DROP TABLE IF EXISTS tbl_phv;
+
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
--
2.39.5 (Apple Git-154)
On Sat, Apr 26, 2025 at 11:05:27PM +0300, Alexander Korotkov wrote:
I did some improvements to PHVs patch: revised comments and commit
message. I'm going to push it if no objections.Uh, v2 was there already. That should be v3.
I was doing some work on pg_hint_plan, evaluating the amount of
breakages caused by v18.
Commit fc069a3a6319 (Implement Self-Join Elimination) is standing out
because it has basically broken the reliability of join_search_hook by
doing the self-join eliminations +before+ we have a chance to call the
hook on a joinlist in make_rel_from_joinlist(). For example, take a
query as simple as that:
SELECT * FROM t1 tab1, t1 tab2 WHERE tab1.c1 = tab2.c1;
Before this commit, it was possible to apply hints to each individual
aliases, driving the planner behavior, with levels_needed at 2 as an
effect of the join list having two members. After this commit, the
self-join elimination does its job first, reduces the join list to 1,
removes the knowledge of the join and prevents any control that we had
before in this code path.
Perhaps it is fair to say that this new limitation in pg_hint_plan
should be documented and its tests reworked to avoid that, but it
seems to me that the change of behavior of join_search_hook could
also hurt some existing use cases. I will not disagree that it may be
more useful to know that the joins are reduced when reaching the
modules while calling the hook, but cdf0231c88bd, that has introduced
the join_search_hook back in 2007 was used for a different reason than
the case I'm seeing as broken at the top of v18 today.
Anyway, it seems to me that we may need to do something here before
the release. Note that if the consensus is "you should update your
module and not rely on the past behavior", I'm OK with that. I just
wanted to raise the issue before this goes GA. And well, I have a
pretty big pool of users that rely on this module, so..
--
Michael
On 26/6/2025 07:40, Michael Paquier wrote:
Anyway, it seems to me that we may need to do something here before
the release. Note that if the consensus is "you should update your
module and not rely on the past behavior", I'm OK with that. I just
wanted to raise the issue before this goes GA. And well, I have a
pretty big pool of users that rely on this module, so..
Thanks for the report and detailed explanation!
Before diving into the pg_hint_plan code, I wonder why you don't have
similar issues with the remove_useless_joins. We intentionally designed
SJE coupled with the left-join removal feature to avoid such type of
complaints:
CREATE TABLE test (x integer PRIMARY KEY);
EXPLAIN (COSTS OFF)
SELECT t1.* FROM test t1 LEFT JOIN test t2 ON (t1.x=t2.x);
/*
QUERY PLAN
---------------------
Seq Scan on test t1
(1 row)
*/
It seems that this join removal is also beyond the pg_hint_plan control ...
--
regards, Andrei Lepikhov
On Thu, Jun 26, 2025 at 08:54:55AM +0200, Andrei Lepikhov wrote:
Before diving into the pg_hint_plan code, I wonder why you don't have
similar issues with the remove_useless_joins. We intentionally designed SJE
coupled with the left-join removal feature to avoid such type of complaints:CREATE TABLE test (x integer PRIMARY KEY);
EXPLAIN (COSTS OFF)
SELECT t1.* FROM test t1 LEFT JOIN test t2 ON (t1.x=t2.x);/*
QUERY PLAN
---------------------
Seq Scan on test t1
(1 row)
*/It seems that this join removal is also beyond the pg_hint_plan control ...
Yeah, remove_useless_joins() has been around since the 9.0 ages as far
as I know, and it's not bothered the module much in the tests because
we have unlikely relied on it.
For now the trick I am going to rely on is just disable
enable_self_join_elimination in the test paths where I rely on the
same relation and self joins to keep the plans of the regression tests
stable. This stuff could create separate relations, but then with
back-patching in mind that's going to be just extra conflict noise,
which is always annoying when dealing with plan outputs. We've done
that in the past with max_parallel_workers_per_gather or jit, to keep
the expected plans stable.
The point regarding the search join hook may stand, though. Perhaps
somebody should check if we're still OK with this change in the
context of the self-join work. I tend to think that we are and I
agree that removing the joins when calling the hook can show benefits,
but it may be surprising and users tend to be very noisy with plan
stability, particularly if some of the FROM aliases get silenced by
the backend without the module knowing about that. At least there is
the trick with SET enable_self_join_elimination available as a last
resort method.
--
Michael
On 27/6/2025 02:26, Michael Paquier wrote:
The point regarding the search join hook may stand, though. Perhaps
somebody should check if we're still OK with this change in the
context of the self-join work. I tend to think that we are and I
agree that removing the joins when calling the hook can show benefits,
but it may be surprising and users tend to be very noisy with plan
stability, particularly if some of the FROM aliases get silenced by
the backend without the module knowing about that. At least there is
the trick with SET enable_self_join_elimination available as a last
resort method.
Parse tree transformation (pull-ups, flattenings) adds joins, constant
clause evaluation or partition pruning may smash whole subtrees - the
optimisation process is quite unstable from the query structure's point
of view.
If I understand correctly, pg_hint_plan needs to differentiate 'never
existed' relations and removed ones. In that sense, up to v.15, Postgres
saved RelOptInfo after removing the join (see e9a20e4).
--
regards, Andrei Lepikhov
Hi, Michael!
On Fri, Jun 27, 2025 at 3:26 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jun 26, 2025 at 08:54:55AM +0200, Andrei Lepikhov wrote:
Before diving into the pg_hint_plan code, I wonder why you don't have
similar issues with the remove_useless_joins. We intentionally designed SJE
coupled with the left-join removal feature to avoid such type of complaints:CREATE TABLE test (x integer PRIMARY KEY);
EXPLAIN (COSTS OFF)
SELECT t1.* FROM test t1 LEFT JOIN test t2 ON (t1.x=t2.x);/*
QUERY PLAN
---------------------
Seq Scan on test t1
(1 row)
*/It seems that this join removal is also beyond the pg_hint_plan control ...
Yeah, remove_useless_joins() has been around since the 9.0 ages as far
as I know, and it's not bothered the module much in the tests because
we have unlikely relied on it.For now the trick I am going to rely on is just disable
enable_self_join_elimination in the test paths where I rely on the
same relation and self joins to keep the plans of the regression tests
stable. This stuff could create separate relations, but then with
back-patching in mind that's going to be just extra conflict noise,
which is always annoying when dealing with plan outputs. We've done
that in the past with max_parallel_workers_per_gather or jit, to keep
the expected plans stable.The point regarding the search join hook may stand, though. Perhaps
somebody should check if we're still OK with this change in the
context of the self-join work. I tend to think that we are and I
agree that removing the joins when calling the hook can show benefits,
but it may be surprising and users tend to be very noisy with plan
stability, particularly if some of the FROM aliases get silenced by
the backend without the module knowing about that. At least there is
the trick with SET enable_self_join_elimination available as a last
resort method.
One thing I could additionally propose it to add hook, which gets
called before self-join elimination. If pg_hint_plan will use this
hook, it could prevent self-join elimination or track this fact in its
data structures.
On the other hand we didn't do similar thing for
remove_useless_joins(). Not sure if it's justified now.
------
Regards,
Alexander Korotkov
Supabase
Hi, Mechael!
On Fri, Jun 27, 2025 at 3:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Fri, Jun 27, 2025 at 3:26 AM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jun 26, 2025 at 08:54:55AM +0200, Andrei Lepikhov wrote:
Before diving into the pg_hint_plan code, I wonder why you don't have
similar issues with the remove_useless_joins. We intentionally designed SJE
coupled with the left-join removal feature to avoid such type of complaints:CREATE TABLE test (x integer PRIMARY KEY);
EXPLAIN (COSTS OFF)
SELECT t1.* FROM test t1 LEFT JOIN test t2 ON (t1.x=t2.x);/*
QUERY PLAN
---------------------
Seq Scan on test t1
(1 row)
*/It seems that this join removal is also beyond the pg_hint_plan control ...
Yeah, remove_useless_joins() has been around since the 9.0 ages as far
as I know, and it's not bothered the module much in the tests because
we have unlikely relied on it.For now the trick I am going to rely on is just disable
enable_self_join_elimination in the test paths where I rely on the
same relation and self joins to keep the plans of the regression tests
stable. This stuff could create separate relations, but then with
back-patching in mind that's going to be just extra conflict noise,
which is always annoying when dealing with plan outputs. We've done
that in the past with max_parallel_workers_per_gather or jit, to keep
the expected plans stable.The point regarding the search join hook may stand, though. Perhaps
somebody should check if we're still OK with this change in the
context of the self-join work. I tend to think that we are and I
agree that removing the joins when calling the hook can show benefits,
but it may be surprising and users tend to be very noisy with plan
stability, particularly if some of the FROM aliases get silenced by
the backend without the module knowing about that. At least there is
the trick with SET enable_self_join_elimination available as a last
resort method.One thing I could additionally propose it to add hook, which gets
called before self-join elimination. If pg_hint_plan will use this
hook, it could prevent self-join elimination or track this fact in its
data structures.On the other hand we didn't do similar thing for
remove_useless_joins(). Not sure if it's justified now.
I recently got notification this is in Open Items.
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
What is your opinion on this: do we need additional hook, fair to
leave this "as is" or another option?
------
Regards,
Alexander Korotkov
Supabase
On Wed, Jul 16, 2025 at 12:38:58AM +0300, Alexander Korotkov wrote:
I recently got notification this is in Open Items.
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
What is your opinion on this: do we need additional hook, fair to
leave this "as is" or another option?
I'm OK with the statu-quo on my side when it comes to plan hinting.
With the GUC workaround, it's still possible to get through so it is
not like we don't have any options. The point about other extensions
still stands, I think, but perhaps we are OK even on these fronts as
the join search hook is far from being the most popular one AFAIK.
So dropping the item and do nothing is a fine answer.
--
Michael
On Wed, Jul 16, 2025 at 1:16 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 16, 2025 at 12:38:58AM +0300, Alexander Korotkov wrote:
I recently got notification this is in Open Items.
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
What is your opinion on this: do we need additional hook, fair to
leave this "as is" or another option?I'm OK with the statu-quo on my side when it comes to plan hinting.
With the GUC workaround, it's still possible to get through so it is
not like we don't have any options. The point about other extensions
still stands, I think, but perhaps we are OK even on these fronts as
the join search hook is far from being the most popular one AFAIK.So dropping the item and do nothing is a fine answer.
OK, thank you for your feedback. I've searched GitHub for other users
of join_search_hook. A couple of interesting use cases I found are
the following.
https://github.com/wulczer/saio
https://github.com/ashenBlade/pg_dphyp
But they provide alternative join optimization algorithms. So, in
spite of pg_hint_plan they are probably OK to optimize whatever joins
are left. Thus, it's not yet clear whether this issue affects anybody
else. I've moved this item into the "non-bugs" list, but I'm OK to
re-open it if there are other affected extensions.
------
Regards,
Alexander Korotkov
Supabase