[PATCH] Keeps tracking the uniqueness with UniqueKey

Started by Andy Fanalmost 6 years ago77 messages

zhihui.fan1213@gmail.com

almost 6 years ago

1 attachment(s)

Greetings.

This thread is a follow-up thread for [1]/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com, where I submit a patch for
erasing the
distinct node if we have known the data is unique for sure. But since the
implementation has changed a lot from the beginning and they are not very
related, so I start this new thread to discuss the new strategy to save the
time
of reviewers.

As I said above, my original intention is just used to erase the distinct
clause,
then Tom Lane suggested function query_is_distinct_for, I found the
uniqueness
can be used for costing, remove_useless_join, reduce_unqiue_semijoins.
David suggested to maintain the uniqueness from bottom to top, like join
& subqueries, group-by, distinct, union and so on(we call it as UniqueKey).

Ideally the uniqueness will be be lost in any case. This current
implementation
follows the David's suggestion and also thanks Ashutosh who reminded me
the cost should be ok while I had concerns of this at the beginning.

A new field named uniquekeys was added in RelOptInfo struct, which is a
list of UniqueKey struct.

typedef struct UniqueKey
{
NodeTag type;
List *exprs;
List *positions;
bool grantee;
} UniqueKey;

exprs is a list of exprs which is unique if we don't care about the null
vaues on
current RelOptInfo.

positions is a list of the sequence no. of the exprs in the current
RelOptInfo,
which is used for SubQuery. like

create table t1 (a int primary key, b int);
create table t2 (a int primary key, b int);
select .. from t1, (select b from t2 group by t2) t2 ..;

The UniqueKey for the subquery will be Var(varno=1, varattno=2), but for
the
top query, the UniqueKey of t2 should be Var(varno=2, varattrno=1), the 1
here
need to be calculated by UnqiueKey->positions.

grantee field is introduced mainly for remove_useless_join &
reduce_unique_semijions.
Take the above case for example:

-- b is nullable. so select b from t2 still can result in duplicated rows.
create unique index t2_uk_b on t2(b);

-- the left join still can be removed since t2.b is a unique index and the
nullable
doesn't matter here.
select t1.* from t1 left join t2 on (t1.b = t2.b);

so t2.b will still be an UniqueKey for t2, just that the grantee = false.

A branch of functions like populate_xxx_unqiuekeys for manipulating
uniquekeys
for a lot of cases, xxx maybe baserel, joinrel, paritioned table, unionrel,
groupbyrel,
distincrel and so on. partitioned table has some not obviously troubles
due to
users can create index on the childrel directly and differently. You can
check
the comments of the code for details.

When maintaining the uniquekeys of joinrel, we have a rule that if both
rels have
UniqueKeys, then any combination from the 2 sides is a unqiquekey of the
joinrel.
I used two algorithms to keep the length of the UniqueKeys short. One is
we only
add useful UniqueKey to the RelOptInfo.uniquekeys. If the expr isn't shown
in
rel->reltargets->exprs, it will not be used for others, so we can ignore it
safely.
The another one is if column sets A is unqiuekey already, any superset of
A
will no need to be added as an UnqiueKey.

The overall cost of the maintaining unqiuekeys should be ok. If you check
the code,
you may find there are many 2 or 3 levels foreach, but most of them are
started with
unique index, and I used UnqiueKeyContext and SubqueryUnqiueKeyContext in
joinrel
and subquery case to avoid too many loops.

Now I have used the UnqiueKey to erase the unnecessary distinct/group by,
and also changed
the rel_is_distinct_for to use UnqiueKeys. so we can handle more cases.

create table m1 (a int primary key, b int, c int);
create table m2 (a int primary key, b int, c int);
create table m3 (a int primary key, b int, c int);

Wit the current patch, we can get:
task3=# explain select t1.a from m3 t1 left join (select m1.a from m1, m2
where m1.b = m2.a limit 1) t2 on (t1.a = t2.a);
QUERY PLAN
---------------------------------------------------------
Seq Scan on m3 t1 (cost=0.00..32.60 rows=2260 width=4)

Before the patch, we will get:
postgres=# explain select t1.a from m3 t1 left join (select m1.a from m1,
m2 where m1.b = m2.a limit 1) t2 on (t1.a = t2.a)
postgres-# ;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Hash Left Join (cost=0.39..41.47 rows=2260 width=4)
Hash Cond: (t1.a = m1.a)
-> Seq Scan on m3 t1 (cost=0.00..32.60 rows=2260 width=4)
-> Hash (cost=0.37..0.37 rows=1 width=4)
-> Limit (cost=0.15..0.36 rows=1 width=4)
-> Nested Loop (cost=0.15..470.41 rows=2260 width=4)
-> Seq Scan on m1 (cost=0.00..32.60 rows=2260
width=8)
-> Index Only Scan using m2_pkey on m2
(cost=0.15..0.19 rows=1 width=4)
Index Cond: (a = m1.b)

The "limit 1" here is just want to avoid the pull_up_subquery to pull up
the subquery,
I think we may still have opportunities to improve this further if we check
if we can
remove a join *just before we join 2 relations*. we may have the similar
situation
for reduce_unique_semijions joins. After the changes has been done, we can
remove
the "limit 1" here to show the diffidence. I didn't include this change in
current patch
since I think the effort may be not small and I want to keep this patch
simple.

Some known issues needs attentions:
1. I didn't check the collation at the whole stage, one reason is the
relation_has_unique_index_for
doesn't check it as well. The other reason if a in collation A is unique,
and then A in collation B is
unique as well, we can ignore it. [2]/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com
2. Current test case contrib/postgres_fdw/sql/postgres_fdw.sql is still
failed. I am not sure if
the bug is in my patch or not.

Kindly waiting for your feedback, Thanks you!

[1]: /messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com
/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com

[2]: /messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com
/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com

Best regards
Andy Fan

Attachments:

v1-0001-Maintain-the-uniqueness-of-a-Query-from-bottom-to.patchapplication/octet-stream; name=v1-0001-Maintain-the-uniqueness-of-a-Query-from-bottom-to.patchDownload

From 4b1ca029370c6cec2ecb2d08f737f2aaf96895a1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 23 Mar 2020 17:58:00 +0800
Subject: [PATCH v1] Maintain the uniqueness of a Query from bottom to top. The
 UniqueKey

can be used to earse distinct, group and improve remove_useless_join
& reduce_unique_semijoins
---
 src/backend/nodes/equalfuncs.c                |    3 +-
 src/backend/nodes/list.c                      |   27 +
 src/backend/nodes/makefuncs.c                 |   15 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1005 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  171 +--
 src/backend/optimizer/plan/initsplan.c        |    9 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   33 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |    8 +
 src/include/nodes/makefuncs.h                 |    2 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   37 +
 src/test/regress/expected/aggregates.out      |   83 +-
 src/test/regress/expected/join.out            |   34 +-
 src/test/regress/expected/select_distinct.out |  388 +++++++
 src/test/regress/sql/join.sql                 |    3 +
 src/test/regress/sql/select_distinct.sql      |  122 ++
 23 files changed, 1768 insertions(+), 239 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 5b1ba143b1..4a49f18479 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -162,8 +162,9 @@ _equalIntoClause(const IntoClause *a, const IntoClause *b)
 static bool
 _equalVar(const Var *a, const Var *b)
 {
-	COMPARE_SCALAR_FIELD(varno);
+	/* Compare varattno first since it has higher selectivity than varno */
 	COMPARE_SCALAR_FIELD(varattno);
+	COMPARE_SCALAR_FIELD(varno);
 	COMPARE_SCALAR_FIELD(vartype);
 	COMPARE_SCALAR_FIELD(vartypmod);
 	COMPARE_SCALAR_FIELD(varcollid);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index bd0c58cd81..9c121f5d75 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -688,6 +688,33 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return ture iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_all_members_in(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+	if (target == NIL && members != NIL)
+		return false;
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e8cdc90c31..163d352c4f 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -809,3 +809,18 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUnqiueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool grantee)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->grantee = grantee;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8286d9cf34..1a26617381 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,14 +223,23 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	reduce_unique_semijoins(root);
+
+	/*
+	 * Generate access paths for the entire join tree.
+	 */
+	rel = make_rel_from_joinlist(root, joinlist);
 
 	return rel;
 }
@@ -786,6 +796,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1289,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2364,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a21c295b99..c6799aa48c 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -920,6 +920,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..0eb904212a
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1005 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This sturct struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel,if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitoned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+		/* We can't grantee an expression will not return a NULL */
+		if (ind->indexprs == NIL)
+			matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		/*
+		 * Since only 1 row returned, any column is unqiue
+		 */
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+									RelOptInfo *rel,
+									List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unquie index doesn't contains partkey, then it is unique
+		 * on this partition only, so it is useless for us
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exact same on all childres,
+	 * Set the UniqueIndex just like it is non-partiton table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * pupulate_distinctrel_unqiuekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+									 RelOptInfo *inputrel,
+									 RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid*/
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+	if (parse->groupingSets != NIL)
+		return;
+	if (parse->groupClause != NIL)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggreation but with out a groupby, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true));
+}
+
+/*
+ * Propagate 'restrictlist' determine if rel2 can duplicate rows in rel1 and
+ * vice-versa.  If the relation at the other side of the join cannot
+ * cause row duplication, then tag the uniquekeys for the relation onto
+ * 'joinrel's uniquekey list.
+ * and for joinrel, the combinations of uk from boths sides are unique as well
+ * no mather the join clauses
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_uniquekey_ctx;
+	List	*innerrel_uniquekey_ctx;
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the left relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_all_members_in(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+    /* fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+	outerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
+	innerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
+
+	if (outerrel_uniquekey_ctx == NIL || innerrel_uniquekey_ctx == NIL)
+		return;
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		/* XXX Looks We don't need to check filter out the grantee=false case?
+		 * create table m1(a int primamry key,  b int);
+		 * create unique index m1_uk_b on m1(b);
+		 * create table m2(a int priamry key, b int);
+		 *
+		 * select m1.a from m1 left join m2 on (m1.b = m2.b);
+		 * m1.b is an uniquekey with grantee=false;
+		 * but m1.a still be an valid uniquekey for joinrel.
+		 */
+		foreach(lc, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/*
+			 * XXX Even for left join/full join type, we still can keep the
+			 * uniquekey->grantee changed.
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/* The combinations of uk from boths sides are unique as well, but no bother to
+	 * add it if its subset has been added already
+	 */
+	foreach(lc, outerrel_uniquekey_ctx)
+	{
+		UniqueKeyContext context1 = (UniqueKeyContext) lfirst(lc);
+		if (context1->added_to_joinrel || !context1->useful)
+			continue;
+		foreach(lc2, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext context2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			if (context2->added_to_joinrel || !context2->useful)
+				continue;
+			exprs = list_copy(context1->uniquekey->exprs);
+			colnos = list_copy(context1->uniquekey->positions);
+			exprs = list_concat(exprs, context2->uniquekey->exprs);
+			colnos = list_concat(colnos, context2->uniquekey->positions);
+			/* We need both sides grantee, we will say the combination is grantee
+			 * suppose m1.b is unique key with grantee=true, and m2.b is unique keys
+			 * with grantee = false;
+			 * we have data m1 ((1), (2))  m2 ((null), null)
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, makeUniqueKey(exprs, colnos,
+																			 context1->uniquekey->grantee &&
+																			 context2->uniquekey->grantee));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid mutli scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ * for details
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unqiue as well, we can't
+	 * grantee others
+	 */
+	Var *var;
+
+	/* the position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the topmost RelOptInfo for the query. currel is the RelOptInfo we get
+ * from the fetch_upper_rel, subrel is the relation in the subquery
+
+ *
+ * select distinct m2.b, m1.a from m1, (select abs(b) from m2 group by abs(b)) m2;
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	/*
+	 * In fact we can check this with list_length(sub_final_rel->reltarget->exprs),
+	 * However, reltarget is not set on UPPERREL_FINAL relation, so do it like this
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext *) * (max_colno_subq + 1));
+
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			/* we don't allocate the memory for it and it is useless for us */
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UnqiueKey */
+		ctx->var = var;
+	}
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* the uk is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs, colnos, ukey->grantee));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unqiue key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's columns mergeop outrerel's unqiuekey
+ * So we need to find out if the outrerel's uniquekey exists in the clause_list
+ * Note: the clause_list must be a list of mergeable restrictinfo.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * begingging
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (!ukey->grantee)
+			continue;
+		if (list_all_members_in(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	c;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	grantee = true;
+
+	/* Only add the index which exists in rel->reltarget */
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			/* We never knows a FuncExpr is nullable or not,  we only handle Var */
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (var->varattno < InvalidAttrNumber)
+				/* System column */
+				continue;
+			/* Must check not null for unqiue index */
+			if (!bms_is_member(var->varattno, rel->not_null_cols))
+				grantee = false;
+
+			/* To keep the uniquekey short, We only add it if it exists in rel->reltrget->exprs */
+			if (match_index_to_operand((Node *)lfirst(lc), c, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, c+1);
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, grantee));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	int c = 1;
+	ListCell	*lc;
+	foreach(lc,  rel->reltarget->exprs)
+	{
+		/* Every columns in this relation is unqiue since only 1 row returned
+		 * No bother to check it is a var or nullable, we can grantee the uniqueness
+		 */
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(list_make1(lfirst(lc)),
+												list_make1_int(c),
+												true));
+		c++;
+	}
+}
+
+/*
+ * initililze_unqiuecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the unqiuekeys which are not possible to use later
+ */
+static List *
+initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitoned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unqiue as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * retrun true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrinct info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known unique_index_list to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_index)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_index)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_index = foreach_delete_current(*global_unique_index, lc);
+	}
+}
+
+/* helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys, makeUniqueKey(exprs, colnos, true));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..b96c6290b7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -37,7 +37,6 @@ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 								  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
-static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 								List *clause_list);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
@@ -178,14 +177,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 	innerrel = find_base_rel(root, innerrelid);
 
-	/*
-	 * Before we go to the effort of checking whether any innerrel variables
-	 * are needed above the join, make a quick check to eliminate cases in
-	 * which we will surely be unable to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -535,14 +526,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 
 		innerrel = find_base_rel(root, innerrelid);
 
-		/*
-		 * Before we trouble to run generate_join_implied_equalities, make a
-		 * quick check to eliminate cases in which we will surely be unable to
-		 * prove uniqueness of the innerrel.
-		 */
-		if (!rel_supports_distinctness(root, innerrel))
-			continue;
-
 		/* Compute the relid set for the join we are considering */
 		joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -570,54 +553,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 }
 
 
-/*
- * rel_supports_distinctness
- *		Could the relation possibly be proven distinct on some set of columns?
- *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
- */
-static bool
-rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
-{
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
-}
 
 /*
  * rel_is_distinct_for
@@ -640,83 +575,34 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
+			/* This UnqiueKey is what we want */
 			return true;
 	}
 	return false;
@@ -976,13 +862,6 @@ innerrel_is_unique(PlannerInfo *root,
 	if (restrictlist == NIL)
 		return false;
 
-	/*
-	 * Make a quick check to eliminate cases in which we will surely be unable
-	 * to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/*
 	 * Query the cache to see if we've managed to prove that innerrel is
 	 * unique for any subset of this outerrel.  We don't need an exact match,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..be78d061ae 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,15 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			foreach(lc, find_nonnullable_vars(qual))
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				if (var->varattno > InvalidAttrNumber)
+					rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..e239839a75 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3813,6 +3815,23 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*required_unique_keys = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		required_unique_keys  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggreation functions
+		 */
+		if (required_unique_keys != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL &&
+			relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -3901,6 +3920,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4617,7 +4638,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4736,6 +4757,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *required_unique_keys =  get_sortgrouplist_exprs(parse->distinctClause,
+																parse->targetList);
+
+	/* If we the result if unqiue already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4919,7 +4946,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5067,6 +5094,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 1a23e18970..a652d39478 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..34d30b181c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -460,6 +461,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->not_null_cols == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		if (relation->rd_att->attrs[i].attnotnull)
+			rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..d4de97016c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,5 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions, bool grantee);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index baced7eec0..a1511b46ea 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 3d3be197e0..743061ff7a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -687,6 +687,7 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	Relids		not_null_cols; /* the non null column for this relation, start from 1 */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -706,6 +707,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1017,6 +1019,23 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo
+ * exprs is a list of exprs which is unqiue on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used transate the exprs's info
+ * in subquery.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		grantee;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..5dfb93895c 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_all_members_in(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..f1967d15c2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,4 +240,41 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool innerl_is_unique_v2(PlannerInfo *root,
+								RelOptInfo *outerrel,
+								RelOptInfo *innerrel,
+								List *restrictlist);
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f457b5b150..b77f70fd55 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1112,31 +1105,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 explain (costs off) select *
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Cannot optimize when PK is deferrable
 explain (costs off) select * from t3 group by a,b,c;
@@ -1161,12 +1150,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1166,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..6e5b8f83f4 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4414,37 +4414,25 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(8 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index f3696c6d1d..b197e30429 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -244,3 +244,391 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: uk1, uk2
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 > 1)
+(2 rows)
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: e
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Scan using select_distinct_a_pkey on select_distinct_a
+   Index Cond: ((pk1 = 1) AND (pk2 = 'c'::bpchar))
+(2 rows)
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_b b
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+         uk1          | uk2 |         pk1          | pk2 
+----------------------+-----+----------------------+-----
+ a                    |   0 | a                    |   0
+ a                    |   0 | d                    |   0
+ a                    |   0 | e                    |   0
+ A                    |   0 | a                    |   0
+ A                    |   0 | d                    |   0
+ A                    |   0 | e                    |   0
+ c                    |   0 | a                    |   0
+ c                    |   0 | d                    |   0
+ c                    |   0 | e                    |   0
+(9 rows)
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a t1
+   ->  Index Only Scan using select_distinct_b_pkey on select_distinct_b t2
+         Index Cond: ((pk1 = t1.pk2) AND (pk2 = t1.e))
+(4 rows)
+
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a t1
+   ->  Index Only Scan using select_distinct_b_pkey on select_distinct_b t2
+         Index Cond: ((pk1 = t1.pk2) AND (pk2 = t1.e))
+(4 rows)
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+(5 rows)
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on select_distinct_b b
+   ->  Index Only Scan using select_distinct_a_pkey on select_distinct_a a
+         Index Cond: (pk1 = b.a)
+(4 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+     |                      | d                    |   0
+(5 rows)
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Hash Full Join
+   Hash Cond: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Hash
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+     |                      | d                    |   0
+(6 rows)
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ HashAggregate
+   Group Key: a.pk1, a.pk2, b.pk1
+   ->  Nested Loop
+         ->  Seq Scan on select_distinct_b b
+         ->  Index Only Scan using select_distinct_a_pkey on select_distinct_a a
+               Index Cond: (pk1 = b.a)
+(6 rows)
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Nested Loop
+   ->  HashAggregate
+         Group Key: select_distinct_b.a
+         ->  Seq Scan on select_distinct_b
+   ->  Index Only Scan using select_distinct_a_pkey on select_distinct_a
+         Index Cond: (pk1 = select_distinct_b.a)
+(6 rows)
+
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+              QUERY PLAN               
+---------------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on select_distinct_b
+(4 rows)
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Nested Loop
+   ->  HashAggregate
+         Group Key: select_distinct_b.a
+         ->  Seq Scan on select_distinct_b
+   ->  Index Scan using select_distinct_a_pkey on select_distinct_a a
+         Index Cond: (pk1 = select_distinct_b.a)
+(6 rows)
+
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Nested Loop
+   ->  HashAggregate
+         Group Key: select_distinct_b.a
+         ->  Seq Scan on select_distinct_b
+   ->  Index Scan using select_distinct_a_pkey on select_distinct_a a
+         Index Cond: (pk1 = select_distinct_b.a)
+(6 rows)
+
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+                QUERY PLAN                 
+-------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: pk1
+         ->  Seq Scan on select_distinct_a
+(4 rows)
+
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a t1
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: dist_p.b
+               ->  Append
+                     ->  Seq Scan on dist_p0 dist_p_1
+                     ->  Seq Scan on dist_p1 dist_p_2
+(8 rows)
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: t1.pk1, t1.pk2, t2.b, t2.c
+         ->  Nested Loop
+               ->  Append
+                     ->  Seq Scan on dist_p0 t2_1
+                     ->  Seq Scan on dist_p1 t2_2
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_a t1
+(9 rows)
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+drop index dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: t1.pk1, t1.pk2, t2.b, t2.c
+         ->  Nested Loop
+               ->  Append
+                     ->  Seq Scan on dist_p0 t2_1
+                     ->  Seq Scan on dist_p1 t2_2
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_a t1
+(9 rows)
+
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+                        QUERY PLAN                        
+----------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: t1.pk1, t1.pk2, t2.c
+         ->  Nested Loop
+               ->  Append
+                     ->  Seq Scan on dist_p0 t2_1
+                     ->  Seq Scan on dist_p1 t2_2
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_a t1
+(9 rows)
+
+drop table dist_p;
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ HashAggregate
+   Group Key: select_distinct_a.uk1, select_distinct_a.uk2
+   ->  Seq Scan on select_distinct_a
+         Filter: (uk2 IS NOT NULL)
+(4 rows)
+
+alter table select_distinct_a alter column uk1 set not null;
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+                        QUERY PLAN                         
+-----------------------------------------------------------
+ HashAggregate
+   Group Key: select_distinct_a.uk1, select_distinct_a.uk2
+   ->  Seq Scan on select_distinct_a
+         Filter: (uk2 IS NOT NULL)
+(4 rows)
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..d6cc8fa845 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1515,11 +1515,14 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index a605e86449..28651b98f1 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -73,3 +73,125 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+drop index dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+
+drop table dist_p;
+
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 set not null;
+
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
-- 
2.21.0

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: Andy Fan (#1)

1 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Mon, Mar 23, 2020 at 6:21 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Greetings.

This thread is a follow-up thread for [1], where I submit a patch for
erasing the
distinct node if we have known the data is unique for sure. But since the
implementation has changed a lot from the beginning and they are not very
related, so I start this new thread to discuss the new strategy to save
the time
of reviewers.

As I said above, my original intention is just used to erase the distinct
clause,
then Tom Lane suggested function query_is_distinct_for, I found the
uniqueness
can be used for costing, remove_useless_join, reduce_unqiue_semijoins.
David suggested to maintain the uniqueness from bottom to top, like join
& subqueries, group-by, distinct, union and so on(we call it as
UniqueKey).
Ideally the uniqueness will be be lost in any case. This current
implementation
follows the David's suggestion and also thanks Ashutosh who reminded me
the cost should be ok while I had concerns of this at the beginning.

A new field named uniquekeys was added in RelOptInfo struct, which is a
list of UniqueKey struct.

typedef struct UniqueKey
{
NodeTag type;
List *exprs;
List *positions;
bool grantee;
} UniqueKey;

exprs is a list of exprs which is unique if we don't care about the null
vaues on
current RelOptInfo.

positions is a list of the sequence no. of the exprs in the current
RelOptInfo,
which is used for SubQuery. like

create table t1 (a int primary key, b int);
create table t2 (a int primary key, b int);
select .. from t1, (select b from t2 group by t2) t2 ..;

The UniqueKey for the subquery will be Var(varno=1, varattno=2), but for
the
top query, the UniqueKey of t2 should be Var(varno=2, varattrno=1), the 1
here
need to be calculated by UnqiueKey->positions.

grantee field is introduced mainly for remove_useless_join &
reduce_unique_semijions.
Take the above case for example:

-- b is nullable. so select b from t2 still can result in duplicated
rows.
create unique index t2_uk_b on t2(b);

-- the left join still can be removed since t2.b is a unique index and the
nullable
doesn't matter here.
select t1.* from t1 left join t2 on (t1.b = t2.b);

so t2.b will still be an UniqueKey for t2, just that the grantee = false.

A branch of functions like populate_xxx_unqiuekeys for manipulating
uniquekeys
for a lot of cases, xxx maybe baserel, joinrel, paritioned table,
unionrel, groupbyrel,
distincrel and so on. partitioned table has some not obviously troubles
due to
users can create index on the childrel directly and differently. You can
check
the comments of the code for details.

When maintaining the uniquekeys of joinrel, we have a rule that if both
rels have
UniqueKeys, then any combination from the 2 sides is a unqiquekey of the
joinrel.
I used two algorithms to keep the length of the UniqueKeys short. One is
we only
add useful UniqueKey to the RelOptInfo.uniquekeys. If the expr isn't
shown in
rel->reltargets->exprs, it will not be used for others, so we can ignore
it safely.
The another one is if column sets A is unqiuekey already, any superset of
A
will no need to be added as an UnqiueKey.

The overall cost of the maintaining unqiuekeys should be ok. If you check
the code,
you may find there are many 2 or 3 levels foreach, but most of them are
started with
unique index, and I used UnqiueKeyContext and SubqueryUnqiueKeyContext in
joinrel
and subquery case to avoid too many loops.

Now I have used the UnqiueKey to erase the unnecessary distinct/group by,
and also changed
the rel_is_distinct_for to use UnqiueKeys. so we can handle more cases.

create table m1 (a int primary key, b int, c int);
create table m2 (a int primary key, b int, c int);
create table m3 (a int primary key, b int, c int);

Wit the current patch, we can get:
task3=# explain select t1.a from m3 t1 left join (select m1.a from m1,
m2 where m1.b = m2.a limit 1) t2 on (t1.a = t2.a);
QUERY PLAN
---------------------------------------------------------
Seq Scan on m3 t1 (cost=0.00..32.60 rows=2260 width=4)

Before the patch, we will get:
postgres=# explain select t1.a from m3 t1 left join (select m1.a from
m1, m2 where m1.b = m2.a limit 1) t2 on (t1.a = t2.a)
postgres-# ;
QUERY PLAN

-----------------------------------------------------------------------------------------------
Hash Left Join (cost=0.39..41.47 rows=2260 width=4)
Hash Cond: (t1.a = m1.a)
-> Seq Scan on m3 t1 (cost=0.00..32.60 rows=2260 width=4)
-> Hash (cost=0.37..0.37 rows=1 width=4)
-> Limit (cost=0.15..0.36 rows=1 width=4)
-> Nested Loop (cost=0.15..470.41 rows=2260 width=4)
-> Seq Scan on m1 (cost=0.00..32.60 rows=2260
width=8)
-> Index Only Scan using m2_pkey on m2
(cost=0.15..0.19 rows=1 width=4)
Index Cond: (a = m1.b)

The "limit 1" here is just want to avoid the pull_up_subquery to pull up
the subquery,
I think we may still have opportunities to improve this further if we
check if we can
remove a join *just before we join 2 relations*. we may have the similar
situation
for reduce_unique_semijions joins. After the changes has been done, we
can remove
the "limit 1" here to show the diffidence. I didn't include this change
in current patch
since I think the effort may be not small and I want to keep this patch
simple.

Some known issues needs attentions:
1. I didn't check the collation at the whole stage, one reason is the
relation_has_unique_index_for
doesn't check it as well. The other reason if a in collation A is
unique, and then A in collation B is
unique as well, we can ignore it. [2]
2. Current test case contrib/postgres_fdw/sql/postgres_fdw.sql is still
failed. I am not sure if
the bug is in my patch or not.

Kindly waiting for your feedback, Thanks you!

[1]
/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com

[2]
/messages/by-id/CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com

Just update the patch which do some test case changes.
1. add "ANALYZE" command before running the explain.
2. order by with an explicit collate settings.
3. As for the postgres_fdw.sql, I just copied the results.out to
expected.out,
that's should be correct based on the result. However I added my comment
around that.

Now suppose the cbfot should pass this time.

Best Regards.
Andy Fan

Attachments:

v2-0001-Maintain-the-uniqueness-of-a-Query-from-bottom-to.patchapplication/octet-stream; name=v2-0001-Maintain-the-uniqueness-of-a-Query-from-bottom-to.patchDownload

From e8bdec93824d3cf186b4e7ad15f077e199d010a2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 23 Mar 2020 17:58:00 +0800
Subject: [PATCH v2] Maintain the uniqueness of a Query from bottom to top. The
 UniqueKey

can be used to earse distinct, group and improve remove_useless_join
& reduce_unique_semijoins
---
 .../postgres_fdw/expected/postgres_fdw.out    |   36 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 src/backend/nodes/equalfuncs.c                |    3 +-
 src/backend/nodes/list.c                      |   27 +
 src/backend/nodes/makefuncs.c                 |   15 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1005 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  171 +--
 src/backend/optimizer/plan/initsplan.c        |    9 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   33 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |    8 +
 src/include/nodes/makefuncs.h                 |    2 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   37 +
 src/test/regress/expected/aggregates.out      |   83 +-
 src/test/regress/expected/join.out            |   34 +-
 src/test/regress/expected/select_distinct.out |  395 +++++++
 src/test/regress/sql/join.sql                 |    3 +
 src/test/regress/sql/select_distinct.sql      |  125 ++
 25 files changed, 1797 insertions(+), 258 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..e50e548934 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2900,24 +2900,24 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 (1 row)
 
 -- Outer query is aggregation query
+-- XXX: we get 200 rows from ft2 t2 where t2.c2 % 6 = 0, however the overall sql just return 1 rows. Looks looks strange.
+--      but since it works like a normal aggreation without a groupBy clause, so I'd say the distinct path can be erased.
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -3416,7 +3416,6 @@ select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.
    Output: sum(q.a), count(q.b)
    ->  Nested Loop Left Join
          Output: q.a, q.b
-         Inner Unique: true
          Join Filter: ((ft4.c1)::numeric <= q.b)
          ->  Foreign Scan on public.ft4
                Output: ft4.c1, ft4.c2, ft4.c3
@@ -3429,7 +3428,7 @@ select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.
                            Output: 13, (avg(ft1.c1)), NULL::bigint
                            Relations: Aggregate on ((public.ft2) LEFT JOIN (public.ft1))
                            Remote SQL: SELECT 13, avg(r1."C 1"), NULL::bigint FROM ("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
-(17 rows)
+(16 rows)
 
 select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 = ft2.c1)) q(a, b, c) on (ft4.c1 <= q.b);
  sum | count 
@@ -4198,7 +4197,6 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
 -------------------------------------------------------------
  Hash Join
    Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
-   Inner Unique: true
    Hash Cond: ((f.f3)::text = (l.f3)::text)
    ->  Foreign Scan on public.ft3 f
          Output: f.f1, f.f2, f.f3
@@ -4208,7 +4206,7 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
          ->  Index Scan using loct3_f1_key on public.loct3 l
                Output: l.f1, l.f2, l.f3
                Index Cond: (l.f1 = 'foo'::text)
-(12 rows)
+(11 rows)
 
 -- ===================================================================
 -- test writable foreign table stuff
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..10faec0cce 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -746,6 +746,8 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 from ft1 where c2 = 6 group by c2;
 
 -- Outer query is aggregation query
+-- XXX: we get 200 rows from ft2 t2 where t2.c2 % 6 = 0, however the overall sql just return 1 rows. Looks looks strange.
+--      but since it works like a normal aggreation without a groupBy clause, so I'd say the distinct path can be erased.
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 88b912977e..63e92d94ef 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -162,8 +162,9 @@ _equalIntoClause(const IntoClause *a, const IntoClause *b)
 static bool
 _equalVar(const Var *a, const Var *b)
 {
-	COMPARE_SCALAR_FIELD(varno);
+	/* Compare varattno first since it has higher selectivity than varno */
 	COMPARE_SCALAR_FIELD(varattno);
+	COMPARE_SCALAR_FIELD(varno);
 	COMPARE_SCALAR_FIELD(vartype);
 	COMPARE_SCALAR_FIELD(vartypmod);
 	COMPARE_SCALAR_FIELD(varcollid);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index bd0c58cd81..9c121f5d75 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -688,6 +688,33 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return ture iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_all_members_in(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+	if (target == NIL && members != NIL)
+		return false;
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e8cdc90c31..163d352c4f 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -809,3 +809,18 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUnqiueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool grantee)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->grantee = grantee;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905bbe77d8..e7383979aa 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,14 +223,23 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	reduce_unique_semijoins(root);
+
+	/*
+	 * Generate access paths for the entire join tree.
+	 */
+	rel = make_rel_from_joinlist(root, joinlist);
 
 	return rel;
 }
@@ -786,6 +796,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1289,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2364,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a21c295b99..c6799aa48c 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -920,6 +920,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..0eb904212a
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1005 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This sturct struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel,if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitoned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+		/* We can't grantee an expression will not return a NULL */
+		if (ind->indexprs == NIL)
+			matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		/*
+		 * Since only 1 row returned, any column is unqiue
+		 */
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+									RelOptInfo *rel,
+									List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unquie index doesn't contains partkey, then it is unique
+		 * on this partition only, so it is useless for us
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exact same on all childres,
+	 * Set the UniqueIndex just like it is non-partiton table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * pupulate_distinctrel_unqiuekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+									 RelOptInfo *inputrel,
+									 RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid*/
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+	if (parse->groupingSets != NIL)
+		return;
+	if (parse->groupClause != NIL)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggreation but with out a groupby, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true));
+}
+
+/*
+ * Propagate 'restrictlist' determine if rel2 can duplicate rows in rel1 and
+ * vice-versa.  If the relation at the other side of the join cannot
+ * cause row duplication, then tag the uniquekeys for the relation onto
+ * 'joinrel's uniquekey list.
+ * and for joinrel, the combinations of uk from boths sides are unique as well
+ * no mather the join clauses
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_uniquekey_ctx;
+	List	*innerrel_uniquekey_ctx;
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the left relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_all_members_in(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+    /* fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+	outerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
+	innerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
+
+	if (outerrel_uniquekey_ctx == NIL || innerrel_uniquekey_ctx == NIL)
+		return;
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		/* XXX Looks We don't need to check filter out the grantee=false case?
+		 * create table m1(a int primamry key,  b int);
+		 * create unique index m1_uk_b on m1(b);
+		 * create table m2(a int priamry key, b int);
+		 *
+		 * select m1.a from m1 left join m2 on (m1.b = m2.b);
+		 * m1.b is an uniquekey with grantee=false;
+		 * but m1.a still be an valid uniquekey for joinrel.
+		 */
+		foreach(lc, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/*
+			 * XXX Even for left join/full join type, we still can keep the
+			 * uniquekey->grantee changed.
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/* The combinations of uk from boths sides are unique as well, but no bother to
+	 * add it if its subset has been added already
+	 */
+	foreach(lc, outerrel_uniquekey_ctx)
+	{
+		UniqueKeyContext context1 = (UniqueKeyContext) lfirst(lc);
+		if (context1->added_to_joinrel || !context1->useful)
+			continue;
+		foreach(lc2, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext context2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			if (context2->added_to_joinrel || !context2->useful)
+				continue;
+			exprs = list_copy(context1->uniquekey->exprs);
+			colnos = list_copy(context1->uniquekey->positions);
+			exprs = list_concat(exprs, context2->uniquekey->exprs);
+			colnos = list_concat(colnos, context2->uniquekey->positions);
+			/* We need both sides grantee, we will say the combination is grantee
+			 * suppose m1.b is unique key with grantee=true, and m2.b is unique keys
+			 * with grantee = false;
+			 * we have data m1 ((1), (2))  m2 ((null), null)
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, makeUniqueKey(exprs, colnos,
+																			 context1->uniquekey->grantee &&
+																			 context2->uniquekey->grantee));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid mutli scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ * for details
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unqiue as well, we can't
+	 * grantee others
+	 */
+	Var *var;
+
+	/* the position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the topmost RelOptInfo for the query. currel is the RelOptInfo we get
+ * from the fetch_upper_rel, subrel is the relation in the subquery
+
+ *
+ * select distinct m2.b, m1.a from m1, (select abs(b) from m2 group by abs(b)) m2;
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	/*
+	 * In fact we can check this with list_length(sub_final_rel->reltarget->exprs),
+	 * However, reltarget is not set on UPPERREL_FINAL relation, so do it like this
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext *) * (max_colno_subq + 1));
+
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			/* we don't allocate the memory for it and it is useless for us */
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UnqiueKey */
+		ctx->var = var;
+	}
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* the uk is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs, colnos, ukey->grantee));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unqiue key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's columns mergeop outrerel's unqiuekey
+ * So we need to find out if the outrerel's uniquekey exists in the clause_list
+ * Note: the clause_list must be a list of mergeable restrictinfo.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * begingging
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (!ukey->grantee)
+			continue;
+		if (list_all_members_in(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	c;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	grantee = true;
+
+	/* Only add the index which exists in rel->reltarget */
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			/* We never knows a FuncExpr is nullable or not,  we only handle Var */
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (var->varattno < InvalidAttrNumber)
+				/* System column */
+				continue;
+			/* Must check not null for unqiue index */
+			if (!bms_is_member(var->varattno, rel->not_null_cols))
+				grantee = false;
+
+			/* To keep the uniquekey short, We only add it if it exists in rel->reltrget->exprs */
+			if (match_index_to_operand((Node *)lfirst(lc), c, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, c+1);
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, grantee));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	int c = 1;
+	ListCell	*lc;
+	foreach(lc,  rel->reltarget->exprs)
+	{
+		/* Every columns in this relation is unqiue since only 1 row returned
+		 * No bother to check it is a var or nullable, we can grantee the uniqueness
+		 */
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(list_make1(lfirst(lc)),
+												list_make1_int(c),
+												true));
+		c++;
+	}
+}
+
+/*
+ * initililze_unqiuecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the unqiuekeys which are not possible to use later
+ */
+static List *
+initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitoned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unqiue as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * retrun true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrinct info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known unique_index_list to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_index)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_index)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_index = foreach_delete_current(*global_unique_index, lc);
+	}
+}
+
+/* helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys, makeUniqueKey(exprs, colnos, true));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..b96c6290b7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -37,7 +37,6 @@ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 								  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
-static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 								List *clause_list);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
@@ -178,14 +177,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 	innerrel = find_base_rel(root, innerrelid);
 
-	/*
-	 * Before we go to the effort of checking whether any innerrel variables
-	 * are needed above the join, make a quick check to eliminate cases in
-	 * which we will surely be unable to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -535,14 +526,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 
 		innerrel = find_base_rel(root, innerrelid);
 
-		/*
-		 * Before we trouble to run generate_join_implied_equalities, make a
-		 * quick check to eliminate cases in which we will surely be unable to
-		 * prove uniqueness of the innerrel.
-		 */
-		if (!rel_supports_distinctness(root, innerrel))
-			continue;
-
 		/* Compute the relid set for the join we are considering */
 		joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -570,54 +553,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 }
 
 
-/*
- * rel_supports_distinctness
- *		Could the relation possibly be proven distinct on some set of columns?
- *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
- */
-static bool
-rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
-{
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
-}
 
 /*
  * rel_is_distinct_for
@@ -640,83 +575,34 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
+			/* This UnqiueKey is what we want */
 			return true;
 	}
 	return false;
@@ -976,13 +862,6 @@ innerrel_is_unique(PlannerInfo *root,
 	if (restrictlist == NIL)
 		return false;
 
-	/*
-	 * Make a quick check to eliminate cases in which we will surely be unable
-	 * to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/*
 	 * Query the cache to see if we've managed to prove that innerrel is
 	 * unique for any subset of this outerrel.  We don't need an exact match,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..be78d061ae 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,15 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			foreach(lc, find_nonnullable_vars(qual))
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				if (var->varattno > InvalidAttrNumber)
+					rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b65abf6046..8043fc4382 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2386,6 +2386,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3810,6 +3812,23 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*required_unique_keys = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		required_unique_keys  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggreation functions
+		 */
+		if (required_unique_keys != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL &&
+			relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -3898,6 +3917,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4615,7 +4636,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4734,6 +4755,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *required_unique_keys =  get_sortgrouplist_exprs(parse->distinctClause,
+																parse->targetList);
+
+	/* If we the result if unqiue already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4912,7 +4939,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5060,6 +5087,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..8aa4d24cb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..34d30b181c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -460,6 +461,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->not_null_cols == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		if (relation->rd_att->attrs[i].attnotnull)
+			rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..d4de97016c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,5 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions, bool grantee);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..679cc4cc9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..7b23b4ec58 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -687,6 +687,7 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	Relids		not_null_cols; /* the non null column for this relation, start from 1 */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -706,6 +707,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1017,6 +1019,23 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo
+ * exprs is a list of exprs which is unqiue on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used transate the exprs's info
+ * in subquery.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		grantee;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..5dfb93895c 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_all_members_in(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..f1967d15c2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,4 +240,41 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool innerl_is_unique_v2(PlannerInfo *root,
+								RelOptInfo *outerrel,
+								RelOptInfo *innerrel,
+								List *restrictlist);
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..4a701af85b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1112,31 +1105,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 explain (costs off) select *
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Cannot optimize when PK is deferrable
 explain (costs off) select * from t3 group by a,b,c;
@@ -1161,12 +1150,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1166,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..6e5b8f83f4 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4414,37 +4414,25 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(8 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..e9263d6151 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,398 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: uk1, uk2
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 > 1)
+(2 rows)
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: e
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Scan using select_distinct_a_pkey on select_distinct_a
+   Index Cond: ((pk1 = 1) AND (pk2 = 'c'::bpchar))
+(2 rows)
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+ANALYZE select_distinct_a;
+ANALYZE select_distinct_b;
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a a
+         Filter: (uk2 IS NOT NULL)
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.uk1 COLLATE "C", a.uk2, b.pk1 COLLATE "C", b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+         uk1          | uk2 |         pk1          | pk2 
+----------------------+-----+----------------------+-----
+ A                    |   0 | a                    |   0
+ A                    |   0 | d                    |   0
+ A                    |   0 | e                    |   0
+ a                    |   0 | a                    |   0
+ a                    |   0 | d                    |   0
+ a                    |   0 | e                    |   0
+ c                    |   0 | a                    |   0
+ c                    |   0 | d                    |   0
+ c                    |   0 | e                    |   0
+(9 rows)
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Nested Loop
+   Join Filter: ((t1.pk2 = t2.pk1) AND (t1.e = t2.pk2))
+   ->  Seq Scan on select_distinct_a t1
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b t2
+(5 rows)
+
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Nested Loop
+   Join Filter: ((t2.pk1 = t1.pk2) AND (t2.pk2 = t1.e))
+   ->  Seq Scan on select_distinct_b t2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(5 rows)
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+(5 rows)
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_b b
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a a
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+     |                      | d                    |   0
+(5 rows)
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Hash Full Join
+   Hash Cond: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Hash
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+     |                      | d                    |   0
+(6 rows)
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: a.pk1, a.pk2, b.pk1
+         ->  Nested Loop
+               Join Filter: (a.pk1 = b.a)
+               ->  Seq Scan on select_distinct_a a
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_b b
+(8 rows)
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (select_distinct_a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b
+(5 rows)
+
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+              QUERY PLAN               
+---------------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on select_distinct_b
+(4 rows)
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop
+   Join Filter: (a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: select_distinct_b.a
+               ->  Seq Scan on select_distinct_b
+(7 rows)
+
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop
+   Join Filter: (a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: select_distinct_b.a
+               ->  Seq Scan on select_distinct_b
+(7 rows)
+
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+                QUERY PLAN                 
+-------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: pk1
+         ->  Seq Scan on select_distinct_a
+(4 rows)
+
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+                   QUERY PLAN                   
+------------------------------------------------
+ Nested Loop
+   ->  HashAggregate
+         Group Key: dist_p.b
+         ->  Append
+               ->  Seq Scan on dist_p0 dist_p_1
+               ->  Seq Scan on dist_p1 dist_p_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.b, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+drop index dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.b, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+drop table dist_p;
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: select_distinct_a.uk1, select_distinct_a.uk2
+         ->  Seq Scan on select_distinct_a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+alter table select_distinct_a alter column uk1 set not null;
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: select_distinct_a.uk1, select_distinct_a.uk2
+         ->  Seq Scan on select_distinct_a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..d6cc8fa845 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1515,11 +1515,14 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..08d1e35095 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,128 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+
+ANALYZE select_distinct_a;
+ANALYZE select_distinct_b;
+
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+select distinct a.uk1 COLLATE "C", a.uk2, b.pk1 COLLATE "C", b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+drop index dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+
+drop table dist_p;
+
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 set not null;
+
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
-- 
2.21.0

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: Andy Fan (#2)

1 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Because I replied the old thread, cfbot run a test based on the old patch
on that thread. I have detached the old thread from commitfest. Reply
this
email again to wake up Mr. cfbot with the right information.

Attachments:

v2-0001-Maintain-the-uniqueness-of-a-Query-from-bottom-to.patchapplication/octet-stream; name=v2-0001-Maintain-the-uniqueness-of-a-Query-from-bottom-to.patchDownload

From e8bdec93824d3cf186b4e7ad15f077e199d010a2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 23 Mar 2020 17:58:00 +0800
Subject: [PATCH v2] Maintain the uniqueness of a Query from bottom to top. The
 UniqueKey

can be used to earse distinct, group and improve remove_useless_join
& reduce_unique_semijoins
---
 .../postgres_fdw/expected/postgres_fdw.out    |   36 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    2 +
 src/backend/nodes/equalfuncs.c                |    3 +-
 src/backend/nodes/list.c                      |   27 +
 src/backend/nodes/makefuncs.c                 |   15 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1005 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  171 +--
 src/backend/optimizer/plan/initsplan.c        |    9 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   33 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |    8 +
 src/include/nodes/makefuncs.h                 |    2 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   19 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   37 +
 src/test/regress/expected/aggregates.out      |   83 +-
 src/test/regress/expected/join.out            |   34 +-
 src/test/regress/expected/select_distinct.out |  395 +++++++
 src/test/regress/sql/join.sql                 |    3 +
 src/test/regress/sql/select_distinct.sql      |  125 ++
 25 files changed, 1797 insertions(+), 258 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..e50e548934 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2900,24 +2900,24 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 (1 row)
 
 -- Outer query is aggregation query
+-- XXX: we get 200 rows from ft2 t2 where t2.c2 % 6 = 0, however the overall sql just return 1 rows. Looks looks strange.
+--      but since it works like a normal aggreation without a groupBy clause, so I'd say the distinct path can be erased.
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -3416,7 +3416,6 @@ select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.
    Output: sum(q.a), count(q.b)
    ->  Nested Loop Left Join
          Output: q.a, q.b
-         Inner Unique: true
          Join Filter: ((ft4.c1)::numeric <= q.b)
          ->  Foreign Scan on public.ft4
                Output: ft4.c1, ft4.c2, ft4.c3
@@ -3429,7 +3428,7 @@ select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.
                            Output: 13, (avg(ft1.c1)), NULL::bigint
                            Relations: Aggregate on ((public.ft2) LEFT JOIN (public.ft1))
                            Remote SQL: SELECT 13, avg(r1."C 1"), NULL::bigint FROM ("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
-(17 rows)
+(16 rows)
 
 select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 = ft2.c1)) q(a, b, c) on (ft4.c1 <= q.b);
  sum | count 
@@ -4198,7 +4197,6 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
 -------------------------------------------------------------
  Hash Join
    Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
-   Inner Unique: true
    Hash Cond: ((f.f3)::text = (l.f3)::text)
    ->  Foreign Scan on public.ft3 f
          Output: f.f1, f.f2, f.f3
@@ -4208,7 +4206,7 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
          ->  Index Scan using loct3_f1_key on public.loct3 l
                Output: l.f1, l.f2, l.f3
                Index Cond: (l.f1 = 'foo'::text)
-(12 rows)
+(11 rows)
 
 -- ===================================================================
 -- test writable foreign table stuff
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..10faec0cce 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -746,6 +746,8 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 from ft1 where c2 = 6 group by c2;
 
 -- Outer query is aggregation query
+-- XXX: we get 200 rows from ft2 t2 where t2.c2 % 6 = 0, however the overall sql just return 1 rows. Looks looks strange.
+--      but since it works like a normal aggreation without a groupBy clause, so I'd say the distinct path can be erased.
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 88b912977e..63e92d94ef 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -162,8 +162,9 @@ _equalIntoClause(const IntoClause *a, const IntoClause *b)
 static bool
 _equalVar(const Var *a, const Var *b)
 {
-	COMPARE_SCALAR_FIELD(varno);
+	/* Compare varattno first since it has higher selectivity than varno */
 	COMPARE_SCALAR_FIELD(varattno);
+	COMPARE_SCALAR_FIELD(varno);
 	COMPARE_SCALAR_FIELD(vartype);
 	COMPARE_SCALAR_FIELD(vartypmod);
 	COMPARE_SCALAR_FIELD(varcollid);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index bd0c58cd81..9c121f5d75 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -688,6 +688,33 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return ture iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_all_members_in(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+	if (target == NIL && members != NIL)
+		return false;
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e8cdc90c31..163d352c4f 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -809,3 +809,18 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUnqiueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool grantee)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->grantee = grantee;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905bbe77d8..e7383979aa 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,14 +223,23 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	reduce_unique_semijoins(root);
+
+	/*
+	 * Generate access paths for the entire join tree.
+	 */
+	rel = make_rel_from_joinlist(root, joinlist);
 
 	return rel;
 }
@@ -786,6 +796,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1289,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2364,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a21c295b99..c6799aa48c 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -920,6 +920,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..0eb904212a
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1005 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This sturct struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel,if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitoned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+		/* We can't grantee an expression will not return a NULL */
+		if (ind->indexprs == NIL)
+			matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		/*
+		 * Since only 1 row returned, any column is unqiue
+		 */
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+									RelOptInfo *rel,
+									List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unquie index doesn't contains partkey, then it is unique
+		 * on this partition only, so it is useless for us
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exact same on all childres,
+	 * Set the UniqueIndex just like it is non-partiton table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * pupulate_distinctrel_unqiuekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+									 RelOptInfo *inputrel,
+									 RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid*/
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+	if (parse->groupingSets != NIL)
+		return;
+	if (parse->groupClause != NIL)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggreation but with out a groupby, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true));
+}
+
+/*
+ * Propagate 'restrictlist' determine if rel2 can duplicate rows in rel1 and
+ * vice-versa.  If the relation at the other side of the join cannot
+ * cause row duplication, then tag the uniquekeys for the relation onto
+ * 'joinrel's uniquekey list.
+ * and for joinrel, the combinations of uk from boths sides are unique as well
+ * no mather the join clauses
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_uniquekey_ctx;
+	List	*innerrel_uniquekey_ctx;
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the left relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_all_members_in(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+    /* fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+	outerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
+	innerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
+
+	if (outerrel_uniquekey_ctx == NIL || innerrel_uniquekey_ctx == NIL)
+		return;
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		/* XXX Looks We don't need to check filter out the grantee=false case?
+		 * create table m1(a int primamry key,  b int);
+		 * create unique index m1_uk_b on m1(b);
+		 * create table m2(a int priamry key, b int);
+		 *
+		 * select m1.a from m1 left join m2 on (m1.b = m2.b);
+		 * m1.b is an uniquekey with grantee=false;
+		 * but m1.a still be an valid uniquekey for joinrel.
+		 */
+		foreach(lc, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/*
+			 * XXX Even for left join/full join type, we still can keep the
+			 * uniquekey->grantee changed.
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/* The combinations of uk from boths sides are unique as well, but no bother to
+	 * add it if its subset has been added already
+	 */
+	foreach(lc, outerrel_uniquekey_ctx)
+	{
+		UniqueKeyContext context1 = (UniqueKeyContext) lfirst(lc);
+		if (context1->added_to_joinrel || !context1->useful)
+			continue;
+		foreach(lc2, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext context2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			if (context2->added_to_joinrel || !context2->useful)
+				continue;
+			exprs = list_copy(context1->uniquekey->exprs);
+			colnos = list_copy(context1->uniquekey->positions);
+			exprs = list_concat(exprs, context2->uniquekey->exprs);
+			colnos = list_concat(colnos, context2->uniquekey->positions);
+			/* We need both sides grantee, we will say the combination is grantee
+			 * suppose m1.b is unique key with grantee=true, and m2.b is unique keys
+			 * with grantee = false;
+			 * we have data m1 ((1), (2))  m2 ((null), null)
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, makeUniqueKey(exprs, colnos,
+																			 context1->uniquekey->grantee &&
+																			 context2->uniquekey->grantee));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid mutli scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ * for details
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unqiue as well, we can't
+	 * grantee others
+	 */
+	Var *var;
+
+	/* the position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the topmost RelOptInfo for the query. currel is the RelOptInfo we get
+ * from the fetch_upper_rel, subrel is the relation in the subquery
+
+ *
+ * select distinct m2.b, m1.a from m1, (select abs(b) from m2 group by abs(b)) m2;
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	/*
+	 * In fact we can check this with list_length(sub_final_rel->reltarget->exprs),
+	 * However, reltarget is not set on UPPERREL_FINAL relation, so do it like this
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext *) * (max_colno_subq + 1));
+
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			/* we don't allocate the memory for it and it is useless for us */
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UnqiueKey */
+		ctx->var = var;
+	}
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* the uk is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs, colnos, ukey->grantee));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unqiue key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's columns mergeop outrerel's unqiuekey
+ * So we need to find out if the outrerel's uniquekey exists in the clause_list
+ * Note: the clause_list must be a list of mergeable restrictinfo.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * begingging
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (!ukey->grantee)
+			continue;
+		if (list_all_members_in(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	c;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	grantee = true;
+
+	/* Only add the index which exists in rel->reltarget */
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			/* We never knows a FuncExpr is nullable or not,  we only handle Var */
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (var->varattno < InvalidAttrNumber)
+				/* System column */
+				continue;
+			/* Must check not null for unqiue index */
+			if (!bms_is_member(var->varattno, rel->not_null_cols))
+				grantee = false;
+
+			/* To keep the uniquekey short, We only add it if it exists in rel->reltrget->exprs */
+			if (match_index_to_operand((Node *)lfirst(lc), c, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, c+1);
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, grantee));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	int c = 1;
+	ListCell	*lc;
+	foreach(lc,  rel->reltarget->exprs)
+	{
+		/* Every columns in this relation is unqiue since only 1 row returned
+		 * No bother to check it is a var or nullable, we can grantee the uniqueness
+		 */
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(list_make1(lfirst(lc)),
+												list_make1_int(c),
+												true));
+		c++;
+	}
+}
+
+/*
+ * initililze_unqiuecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the unqiuekeys which are not possible to use later
+ */
+static List *
+initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitoned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unqiue as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * retrun true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrinct info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known unique_index_list to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_index)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_index)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_index = foreach_delete_current(*global_unique_index, lc);
+	}
+}
+
+/* helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys, makeUniqueKey(exprs, colnos, true));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..b96c6290b7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -37,7 +37,6 @@ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 								  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
-static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 								List *clause_list);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
@@ -178,14 +177,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 	innerrel = find_base_rel(root, innerrelid);
 
-	/*
-	 * Before we go to the effort of checking whether any innerrel variables
-	 * are needed above the join, make a quick check to eliminate cases in
-	 * which we will surely be unable to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -535,14 +526,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 
 		innerrel = find_base_rel(root, innerrelid);
 
-		/*
-		 * Before we trouble to run generate_join_implied_equalities, make a
-		 * quick check to eliminate cases in which we will surely be unable to
-		 * prove uniqueness of the innerrel.
-		 */
-		if (!rel_supports_distinctness(root, innerrel))
-			continue;
-
 		/* Compute the relid set for the join we are considering */
 		joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -570,54 +553,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 }
 
 
-/*
- * rel_supports_distinctness
- *		Could the relation possibly be proven distinct on some set of columns?
- *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
- */
-static bool
-rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
-{
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
-}
 
 /*
  * rel_is_distinct_for
@@ -640,83 +575,34 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
+			/* This UnqiueKey is what we want */
 			return true;
 	}
 	return false;
@@ -976,13 +862,6 @@ innerrel_is_unique(PlannerInfo *root,
 	if (restrictlist == NIL)
 		return false;
 
-	/*
-	 * Make a quick check to eliminate cases in which we will surely be unable
-	 * to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/*
 	 * Query the cache to see if we've managed to prove that innerrel is
 	 * unique for any subset of this outerrel.  We don't need an exact match,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..be78d061ae 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,15 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			foreach(lc, find_nonnullable_vars(qual))
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				if (var->varattno > InvalidAttrNumber)
+					rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b65abf6046..8043fc4382 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2386,6 +2386,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3810,6 +3812,23 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*required_unique_keys = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		required_unique_keys  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggreation functions
+		 */
+		if (required_unique_keys != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL &&
+			relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -3898,6 +3917,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4615,7 +4636,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4734,6 +4755,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *required_unique_keys =  get_sortgrouplist_exprs(parse->distinctClause,
+																parse->targetList);
+
+	/* If we the result if unqiue already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4912,7 +4939,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5060,6 +5087,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..8aa4d24cb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..34d30b181c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -460,6 +461,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->not_null_cols == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		if (relation->rd_att->attrs[i].attnotnull)
+			rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..d4de97016c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,5 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions, bool grantee);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..679cc4cc9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..7b23b4ec58 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -687,6 +687,7 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	Relids		not_null_cols; /* the non null column for this relation, start from 1 */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -706,6 +707,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1017,6 +1019,23 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo
+ * exprs is a list of exprs which is unqiue on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used transate the exprs's info
+ * in subquery.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		grantee;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..5dfb93895c 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_all_members_in(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..f1967d15c2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,4 +240,41 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool innerl_is_unique_v2(PlannerInfo *root,
+								RelOptInfo *outerrel,
+								RelOptInfo *innerrel,
+								List *restrictlist);
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..4a701af85b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1112,31 +1105,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 explain (costs off) select *
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Cannot optimize when PK is deferrable
 explain (costs off) select * from t3 group by a,b,c;
@@ -1161,12 +1150,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1166,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..6e5b8f83f4 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4414,37 +4414,25 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(8 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..e9263d6151 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,398 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: uk1, uk2
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 > 1)
+(2 rows)
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: e
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Scan using select_distinct_a_pkey on select_distinct_a
+   Index Cond: ((pk1 = 1) AND (pk2 = 'c'::bpchar))
+(2 rows)
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+ANALYZE select_distinct_a;
+ANALYZE select_distinct_b;
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a a
+         Filter: (uk2 IS NOT NULL)
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.uk1 COLLATE "C", a.uk2, b.pk1 COLLATE "C", b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+         uk1          | uk2 |         pk1          | pk2 
+----------------------+-----+----------------------+-----
+ A                    |   0 | a                    |   0
+ A                    |   0 | d                    |   0
+ A                    |   0 | e                    |   0
+ a                    |   0 | a                    |   0
+ a                    |   0 | d                    |   0
+ a                    |   0 | e                    |   0
+ c                    |   0 | a                    |   0
+ c                    |   0 | d                    |   0
+ c                    |   0 | e                    |   0
+(9 rows)
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Nested Loop
+   Join Filter: ((t1.pk2 = t2.pk1) AND (t1.e = t2.pk2))
+   ->  Seq Scan on select_distinct_a t1
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b t2
+(5 rows)
+
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Nested Loop
+   Join Filter: ((t2.pk1 = t1.pk2) AND (t2.pk2 = t1.e))
+   ->  Seq Scan on select_distinct_b t2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(5 rows)
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+(5 rows)
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_b b
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a a
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+     |                      | d                    |   0
+(5 rows)
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Hash Full Join
+   Hash Cond: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Hash
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+     |                      | d                    |   0
+(6 rows)
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: a.pk1, a.pk2, b.pk1
+         ->  Nested Loop
+               Join Filter: (a.pk1 = b.a)
+               ->  Seq Scan on select_distinct_a a
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_b b
+(8 rows)
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (select_distinct_a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b
+(5 rows)
+
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+              QUERY PLAN               
+---------------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on select_distinct_b
+(4 rows)
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop
+   Join Filter: (a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: select_distinct_b.a
+               ->  Seq Scan on select_distinct_b
+(7 rows)
+
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop
+   Join Filter: (a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: select_distinct_b.a
+               ->  Seq Scan on select_distinct_b
+(7 rows)
+
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+                QUERY PLAN                 
+-------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: pk1
+         ->  Seq Scan on select_distinct_a
+(4 rows)
+
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+                   QUERY PLAN                   
+------------------------------------------------
+ Nested Loop
+   ->  HashAggregate
+         Group Key: dist_p.b
+         ->  Append
+               ->  Seq Scan on dist_p0 dist_p_1
+               ->  Seq Scan on dist_p1 dist_p_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.b, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+drop index dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.b, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+drop table dist_p;
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: select_distinct_a.uk1, select_distinct_a.uk2
+         ->  Seq Scan on select_distinct_a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+alter table select_distinct_a alter column uk1 set not null;
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: select_distinct_a.uk1, select_distinct_a.uk2
+         ->  Seq Scan on select_distinct_a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..d6cc8fa845 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1515,11 +1515,14 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..08d1e35095 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,128 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+
+ANALYZE select_distinct_a;
+ANALYZE select_distinct_b;
+
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+select distinct a.uk1 COLLATE "C", a.uk2, b.pk1 COLLATE "C", b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+drop index dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+
+drop table dist_p;
+
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 set not null;
+
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
-- 
2.21.0

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: Andy Fan (#2)

1 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Just update the patch which do some test case changes.

1. add "ANALYZE" command before running the explain.
2. order by with an explicit collate settings.

Thanks Rushabh for pointing this out, or else I'd spend much more time to
figure
out why I get a different order on Windows.

3. As for the postgres_fdw.sql, I just copied the results.out to

expected.out,
that's should be correct based on the result. However I added my comment
around that.

The issue doesn't exist at all. The confusion was introduced by a

misunderstanding
of the test case (I treated count (xx) filter (xxx) as a window function
rather than an aggration
function). so just fixed the it cleanly.

Some other changes made in the new patch:
1. Fixed bug for UniqueKey calculation for OUTER join.
2. Fixed some typo error in comments.
3. Renamed the field "grantee" as "guarantee".

Best Regards
Andy Fan

Attachments:

v3-0001-Maintain-UniqueKey-at-each-RelOptInfo-this-inform.patchapplication/octet-stream; name=v3-0001-Maintain-UniqueKey-at-each-RelOptInfo-this-inform.patchDownload

From 42e4031c9255e89428cc095782d08ad95b15a07f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sat, 28 Mar 2020 13:44:54 +0800
Subject: [PATCH v3] Maintain UniqueKey at each RelOptInfo, this information
 can be used

to erasing distinct/group path if we are sure the result is unique
already. And rel_is_unique_for is modified to use UniqueKey to
detect more cases.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   34 +-
 src/backend/nodes/equalfuncs.c                |    3 +-
 src/backend/nodes/list.c                      |   27 +
 src/backend/nodes/makefuncs.c                 |   15 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1033 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  171 +--
 src/backend/optimizer/plan/initsplan.c        |    9 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   33 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |    8 +
 src/include/nodes/makefuncs.h                 |    2 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   23 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   37 +
 src/test/regress/expected/aggregates.out      |   83 +-
 src/test/regress/expected/join.out            |   34 +-
 src/test/regress/expected/select_distinct.out |  395 +++++++
 src/test/regress/sql/join.sql                 |    3 +
 src/test/regress/sql/select_distinct.sql      |  125 ++
 24 files changed, 1825 insertions(+), 258 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..ba3947c9f9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2902,22 +2902,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
@@ -3416,7 +3414,6 @@ select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.
    Output: sum(q.a), count(q.b)
    ->  Nested Loop Left Join
          Output: q.a, q.b
-         Inner Unique: true
          Join Filter: ((ft4.c1)::numeric <= q.b)
          ->  Foreign Scan on public.ft4
                Output: ft4.c1, ft4.c2, ft4.c3
@@ -3429,7 +3426,7 @@ select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.
                            Output: 13, (avg(ft1.c1)), NULL::bigint
                            Relations: Aggregate on ((public.ft2) LEFT JOIN (public.ft1))
                            Remote SQL: SELECT 13, avg(r1."C 1"), NULL::bigint FROM ("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
-(17 rows)
+(16 rows)
 
 select sum(q.a), count(q.b) from ft4 left join (select 13, avg(ft1.c1), sum(ft2.c1) from ft1 right join ft2 on (ft1.c1 = ft2.c1)) q(a, b, c) on (ft4.c1 <= q.b);
  sum | count 
@@ -4198,7 +4195,6 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
 -------------------------------------------------------------
  Hash Join
    Output: f.f1, f.f2, f.f3, l.f1, l.f2, l.f3
-   Inner Unique: true
    Hash Cond: ((f.f3)::text = (l.f3)::text)
    ->  Foreign Scan on public.ft3 f
          Output: f.f1, f.f2, f.f3
@@ -4208,7 +4204,7 @@ explain (verbose, costs off) select * from ft3 f, loct3 l
          ->  Index Scan using loct3_f1_key on public.loct3 l
                Output: l.f1, l.f2, l.f3
                Index Cond: (l.f1 = 'foo'::text)
-(12 rows)
+(11 rows)
 
 -- ===================================================================
 -- test writable foreign table stuff
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 88b912977e..63e92d94ef 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -162,8 +162,9 @@ _equalIntoClause(const IntoClause *a, const IntoClause *b)
 static bool
 _equalVar(const Var *a, const Var *b)
 {
-	COMPARE_SCALAR_FIELD(varno);
+	/* Compare varattno first since it has higher selectivity than varno */
 	COMPARE_SCALAR_FIELD(varattno);
+	COMPARE_SCALAR_FIELD(varno);
 	COMPARE_SCALAR_FIELD(vartype);
 	COMPARE_SCALAR_FIELD(vartypmod);
 	COMPARE_SCALAR_FIELD(varcollid);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index bd0c58cd81..c7336b10f2 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -688,6 +688,33 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return true iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_all_members_in(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+	if (target == NIL && members != NIL)
+		return false;
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e8cdc90c31..7082c8b1b0 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -809,3 +809,18 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUnqiueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool guarantee)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->guarantee = guarantee;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905bbe77d8..e7383979aa 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,14 +223,23 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	reduce_unique_semijoins(root);
+
+	/*
+	 * Generate access paths for the entire join tree.
+	 */
+	rel = make_rel_from_joinlist(root, joinlist);
 
 	return rel;
 }
@@ -786,6 +796,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1289,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2364,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a21c295b99..c6799aa48c 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -920,6 +920,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..58fe16be55
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1033 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel, if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+		/* We can't guarantee if an expression returns a NULL value, so ignore it */
+		if (ind->indexprs != NIL)
+			continue;
+		matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		/*
+		 * Since only 1 row returned, any column is unique
+		 */
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+									RelOptInfo *rel,
+									List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unique index doesn't contain partkey, then it is unique
+		 * on this partition only, so it is useless for us.
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exactly same on all childrels,
+	 * Set the UniqueIndex just like it is non-partition table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+									 RelOptInfo *inputrel,
+									 RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid*/
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+	if (parse->groupingSets != NIL)
+		return;
+	if (parse->groupClause != NIL)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggregation but without a group by, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+void
+populate_unionrel_uniquiekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true));
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if it's
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The guarantee field will be changed on some outer join case.
+
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check if combination of unqiuekeys from both side is still useful for us,
+ * if yes, we add it to joinrel as well.  We only set the guarantee field to true iff both
+ * uniquekeys have 'guarantee' equals true.
+ *
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_uniquekey_ctx;
+	List	*innerrel_uniquekey_ctx;
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_all_members_in(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+    /* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	outerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
+	innerrel_uniquekey_ctx = initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		foreach(lc, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				/* The UniqueKey on baserel is not useful on the joinrel */
+				ctx->useful = false;
+				continue;
+			}
+
+			if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) && ctx->uniquekey->guarantee)
+			{
+				/* We can't guarantee the uniqueness anymore due to the outer join can
+				 * duplicate the null values on these columns, so we set guarantee to false.
+				 * AND we don't set the ctx->added_to_joinrel on purpose since we still have
+				 * chances have an guarantee uniquekey after we combine with the UniqueKey
+				 * from another relation.
+				 */
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															false));
+
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_all_members_in(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/* NULL values in outer rel can be duplicated under JOIN_FULL only */
+			if (jointype == JOIN_FULL && ctx->uniquekey->guarantee)
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															false));
+
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	/* The combination of the UniqueKey from both sides is unique as well,
+	 * but no bother to add it if its subset has been added already
+	 */
+	foreach(lc, outerrel_uniquekey_ctx)
+	{
+		UniqueKeyContext context1 = (UniqueKeyContext) lfirst(lc);
+		if (context1->added_to_joinrel || !context1->useful)
+			continue;
+		foreach(lc2, innerrel_uniquekey_ctx)
+		{
+			UniqueKeyContext context2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			if (context2->added_to_joinrel || !context2->useful)
+				continue;
+			exprs = list_copy(context1->uniquekey->exprs);
+			colnos = list_copy(context1->uniquekey->positions);
+			exprs = list_concat(exprs, context2->uniquekey->exprs);
+			colnos = list_concat(colnos, context2->uniquekey->positions);
+			/* We need both sides guarantee=true, we will say the uniqueness of the combination
+			 * is guarantee.
+			 */
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys, makeUniqueKey(exprs, colnos,
+																			 context1->uniquekey->guarantee &&
+																			 context2->uniquekey->guarantee));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid mutli scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unique as well, we can't
+	 * guarantee others
+	 */
+	Var *var;
+
+	/* The position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the RelOptInfo in current level, sub_final_rel is get from the fetch_upper_rel
+ * we need to convert the UnqiueKey from sub_final_rel to currel via the positions info in
+ * UniqueKey
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	/*
+	 * Calculate max_colno in subquery. In fact we can check this with
+	 * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+	 * is not set on UPPERREL_FINAL relation, so do it this way
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext) * (max_colno_subq + 1));
+
+	/*
+	 * Create an array for each expr in currel->reltarget->exprs, the array index
+	 * is the colno in subquery, so that we can get the expr quickly given a colno_subq
+	 */
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UnqiueKey */
+		ctx->var = var;
+	}
+
+	/* Cover the UniqueKey from sub_final_rel to currel */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* The column is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs, colnos, ukey->guarantee));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unqiue key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's uniquekey mergeop outrerel's uniquekey exists
+ * in clause_list.
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * beginning
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (!ukey->guarantee)
+			continue;
+		if (list_all_members_in(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	c;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	guarantee = true;
+
+	/* Only add the index which exists in rel->reltarget */
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+		/* To keep the uniquekey short, We only add it if it exists in rel->reltrget->exprs */
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (var->varattno < InvalidAttrNumber)
+				/* System column */
+				continue;
+			/* Must check not null for unique index */
+			if (!bms_is_member(var->varattno, rel->not_null_cols))
+				guarantee = false;
+
+			if (match_index_to_operand((Node *)lfirst(lc), c, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, c+1);
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, guarantee));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	int c = 1;
+	ListCell	*lc;
+	foreach(lc,  rel->reltarget->exprs)
+	{
+		/* Every columns in this relation is unique since only 1 row returned
+		 * No bother to check it is a var or nullable, we can guarantee the uniqueness
+		 */
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(list_make1(lfirst(lc)),
+												list_make1_int(c),
+												true));
+		c++;
+	}
+}
+
+/*
+ * initililze_unqiuecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the unqiuekeys which are not possible to use later
+ */
+static List *
+initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitoned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * retrun true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrinct info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known global_unique_indexes to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys, makeUniqueKey(exprs, colnos, true));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..b96c6290b7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -37,7 +37,6 @@ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 								  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
-static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
 								List *clause_list);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
@@ -178,14 +177,6 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 	innerrel = find_base_rel(root, innerrelid);
 
-	/*
-	 * Before we go to the effort of checking whether any innerrel variables
-	 * are needed above the join, make a quick check to eliminate cases in
-	 * which we will surely be unable to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -535,14 +526,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 
 		innerrel = find_base_rel(root, innerrelid);
 
-		/*
-		 * Before we trouble to run generate_join_implied_equalities, make a
-		 * quick check to eliminate cases in which we will surely be unable to
-		 * prove uniqueness of the innerrel.
-		 */
-		if (!rel_supports_distinctness(root, innerrel))
-			continue;
-
 		/* Compute the relid set for the join we are considering */
 		joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
@@ -570,54 +553,6 @@ reduce_unique_semijoins(PlannerInfo *root)
 }
 
 
-/*
- * rel_supports_distinctness
- *		Could the relation possibly be proven distinct on some set of columns?
- *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
- */
-static bool
-rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
-{
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
-}
 
 /*
  * rel_is_distinct_for
@@ -640,83 +575,34 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
+			/* This UnqiueKey is what we want */
 			return true;
 	}
 	return false;
@@ -976,13 +862,6 @@ innerrel_is_unique(PlannerInfo *root,
 	if (restrictlist == NIL)
 		return false;
 
-	/*
-	 * Make a quick check to eliminate cases in which we will surely be unable
-	 * to prove uniqueness of the innerrel.
-	 */
-	if (!rel_supports_distinctness(root, innerrel))
-		return false;
-
 	/*
 	 * Query the cache to see if we've managed to prove that innerrel is
 	 * unique for any subset of this outerrel.  We don't need an exact match,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..be78d061ae 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,15 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			foreach(lc, find_nonnullable_vars(qual))
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				if (var->varattno > InvalidAttrNumber)
+					rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b65abf6046..8043fc4382 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2386,6 +2386,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3810,6 +3812,23 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*required_unique_keys = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		required_unique_keys  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggreation functions
+		 */
+		if (required_unique_keys != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL &&
+			relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -3898,6 +3917,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4615,7 +4636,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4734,6 +4755,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *required_unique_keys =  get_sortgrouplist_exprs(parse->distinctClause,
+																parse->targetList);
+
+	/* If we the result if unqiue already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, required_unique_keys))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4912,7 +4939,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5060,6 +5087,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..8aa4d24cb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..34d30b181c 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -460,6 +461,13 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->not_null_cols == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		if (relation->rd_att->attrs[i].attnotnull)
+			rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..d4de97016c 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,5 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions, bool grantee);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..679cc4cc9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..accec6df4e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -687,6 +687,7 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	Relids		not_null_cols; /* the non null column for this relation, start from 1 */
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -706,6 +707,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1017,6 +1019,27 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo
+ * exprs is a list of exprs which is unqiue on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used transate the exprs's info
+ * in subquery.
+ * guarantee: true means it can guarantee the uniqueness all the time, false if we
+ * can only guarantee the uniqueness without considering the null values. This
+ * field is necessary for remove_useless_join & reduce_unique_semijions since
+ * these cases don't care about the null values.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		guarantee;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..5dfb93895c 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_all_members_in(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..f1967d15c2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,4 +240,41 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool innerl_is_unique_v2(PlannerInfo *root,
+								RelOptInfo *outerrel,
+								RelOptInfo *innerrel,
+								List *restrictlist);
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..4a701af85b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1112,31 +1105,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 explain (costs off) select *
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Cannot optimize when PK is deferrable
 explain (costs off) select * from t3 group by a,b,c;
@@ -1161,12 +1150,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1166,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..6e5b8f83f4 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4414,37 +4414,25 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(8 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..e9263d6151 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,398 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: uk1, uk2
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 > 1)
+(2 rows)
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: e
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Index Scan using select_distinct_a_pkey on select_distinct_a
+   Index Cond: ((pk1 = 1) AND (pk2 = 'c'::bpchar))
+(2 rows)
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+ANALYZE select_distinct_a;
+ANALYZE select_distinct_b;
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a a
+         Filter: (uk2 IS NOT NULL)
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.uk1 COLLATE "C", a.uk2, b.pk1 COLLATE "C", b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+         uk1          | uk2 |         pk1          | pk2 
+----------------------+-----+----------------------+-----
+ A                    |   0 | a                    |   0
+ A                    |   0 | d                    |   0
+ A                    |   0 | e                    |   0
+ a                    |   0 | a                    |   0
+ a                    |   0 | d                    |   0
+ a                    |   0 | e                    |   0
+ c                    |   0 | a                    |   0
+ c                    |   0 | d                    |   0
+ c                    |   0 | e                    |   0
+(9 rows)
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Nested Loop
+   Join Filter: ((t1.pk2 = t2.pk1) AND (t1.e = t2.pk2))
+   ->  Seq Scan on select_distinct_a t1
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b t2
+(5 rows)
+
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+                       QUERY PLAN                       
+--------------------------------------------------------
+ Nested Loop
+   Join Filter: ((t2.pk1 = t1.pk2) AND (t2.pk2 = t1.e))
+   ->  Seq Scan on select_distinct_b t2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(5 rows)
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+(5 rows)
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop Left Join
+   Join Filter: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_b b
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a a
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+     |                      | d                    |   0
+(5 rows)
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+                 QUERY PLAN                  
+---------------------------------------------
+ Hash Full Join
+   Hash Cond: (a.pk1 = b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Hash
+         ->  Seq Scan on select_distinct_b b
+(5 rows)
+
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+ pk1 |         pk2          |         pk1          | pk2 
+-----+----------------------+----------------------+-----
+   1 | a                    | a                    |   0
+   1 | a                    | e                    |   0
+   1 | b                    | a                    |   0
+   1 | b                    | e                    |   0
+   3 | c                    |                      |    
+     |                      | d                    |   0
+(6 rows)
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: a.pk1, a.pk2, b.pk1
+         ->  Nested Loop
+               Join Filter: (a.pk1 = b.a)
+               ->  Seq Scan on select_distinct_a a
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_b b
+(8 rows)
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Nested Loop Semi Join
+   Join Filter: (select_distinct_a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b
+(5 rows)
+
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+              QUERY PLAN               
+---------------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on select_distinct_b
+(4 rows)
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop
+   Join Filter: (a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: select_distinct_b.a
+               ->  Seq Scan on select_distinct_b
+(7 rows)
+
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Nested Loop
+   Join Filter: (a.pk1 = select_distinct_b.a)
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  HashAggregate
+               Group Key: select_distinct_b.a
+               ->  Seq Scan on select_distinct_b
+(7 rows)
+
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+ pk1 |         pk2          |         uk1          | uk2 | e | a 
+-----+----------------------+----------------------+-----+---+---
+   1 | a                    | a                    |   0 | 1 | 1
+   1 | b                    | A                    |   0 | 2 | 1
+(2 rows)
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+                QUERY PLAN                 
+-------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: pk1
+         ->  Seq Scan on select_distinct_a
+(4 rows)
+
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+                   QUERY PLAN                   
+------------------------------------------------
+ Nested Loop
+   ->  HashAggregate
+         Group Key: dist_p.b
+         ->  Append
+               ->  Seq Scan on dist_p0 dist_p_1
+               ->  Seq Scan on dist_p1 dist_p_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.b, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+drop index dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.b, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+                  QUERY PLAN                  
+----------------------------------------------
+ Nested Loop
+   ->  Append
+         ->  Seq Scan on dist_p0 t2_1
+         ->  Seq Scan on dist_p1 t2_2
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a t1
+(6 rows)
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+                     QUERY PLAN                     
+----------------------------------------------------
+ HashAggregate
+   Group Key: t1.pk1, t1.pk2, t2.c
+   ->  Nested Loop
+         ->  Append
+               ->  Seq Scan on dist_p0 t2_1
+               ->  Seq Scan on dist_p1 t2_2
+         ->  Materialize
+               ->  Seq Scan on select_distinct_a t1
+(8 rows)
+
+drop table dist_p;
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: select_distinct_a.uk1, select_distinct_a.uk2
+         ->  Seq Scan on select_distinct_a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+alter table select_distinct_a alter column uk1 set not null;
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+   Filter: (uk2 IS NOT NULL)
+(2 rows)
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+                           QUERY PLAN                           
+----------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: select_distinct_a.uk1, select_distinct_a.uk2
+         ->  Seq Scan on select_distinct_a
+               Filter: (uk2 IS NOT NULL)
+(5 rows)
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..d6cc8fa845 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1515,11 +1515,14 @@ select d.* from d left join (select distinct * from b) s
 -- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
+-- XXXX: since b.id is unqiue now so the group by cluase is erased, so
+-- the join removal can happen now.
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
 -- similarly, but keying off a DISTINCT clause
+-- XXX we erase distinct clause, so the joinremoval is ok as well.
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..08d1e35095 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,128 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+create table select_distinct_a(pk1 int, pk2 char(20),  uk1 char(20) not null,  uk2 int, e int, primary key(pk1, pk2));
+create unique index select_distinct_a_uk on select_distinct_a(uk1, uk2);
+create table select_distinct_b(a int, b char(20), pk1 char(20), pk2 int, e int, primary key(pk1, pk2));
+
+-- distinct erased since (pk1, pk2)
+explain (costs off) select distinct * from select_distinct_a;
+
+-- distinct can't be reased since since we required all the uk must be not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a;
+
+-- distinct ereased since uk + not null
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select distinct uk1, uk2 from select_distinct_a where uk2 > 1;
+
+-- distinct erased due to group by
+explain (costs off) select distinct e from select_distinct_a group by e;
+
+-- distinct erased due to the restirctinfo
+explain (costs off) select distinct uk1 from select_distinct_a where pk1 = 1 and pk2 = 'c';
+
+-- test join
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+insert into select_distinct_a values(1, 'a', 'a', 0, 1), (1, 'b', 'A', 0, 2), (3, 'c', 'c', 0, 3);
+insert into select_distinct_b values(1, 'a', 'a', 0, 1), (4, 'd', 'd', 0, 4), (1, 'e', 'e', 0, 5);
+
+ANALYZE select_distinct_a;
+ANALYZE select_distinct_b;
+
+-- Cartesian join
+explain (costs off) select distinct a.uk1, a.uk2, b.pk1, b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null;
+select distinct a.uk1 COLLATE "C", a.uk2, b.pk1 COLLATE "C", b.pk2 from select_distinct_a a, select_distinct_b b where a.uk2 is not null order by 1, 2, 3, 4;
+
+-- normal join
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_a t1, select_distinct_b t2 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+explain (costs off) select distinct t1.pk1, t1.pk2 from select_distinct_b t2, select_distinct_a t1 where t1.pk2 = t2.pk1 and t1.e = t2.pk2;
+
+-- left join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a left join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;;
+
+-- right join
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a right join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+-- full join
+explain (costs off)  select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a);
+select distinct a.pk1, a.pk2, b.pk1, b.pk2 from select_distinct_a a full outer join select_distinct_b b on (a.pk1 = b.a) order by 1, 2, 3, 4;
+
+explain (costs off) select distinct a.pk1, a.pk2, b.pk1 from select_distinct_a a inner join select_distinct_b b on (a.pk1 = b.a);
+
+
+-- Semi/anti join
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 in (select a from select_distinct_b);
+explain (costs off) select distinct pk1, pk2 from select_distinct_a where pk1 not in (select a from select_distinct_b);
+
+
+-- Test subquery
+explain (costs off) select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a,  (select a from select_distinct_b group by a) b where a.pk1 = b.a order by 1, 2, 3;
+
+explain (costs off) select distinct * from select_distinct_a a,  (select distinct a from select_distinct_b) b where a.pk1 = b.a;
+select distinct * from select_distinct_a a, (select distinct a from select_distinct_b) b where a.pk1 = b.a order by 1 ,2, 3;
+
+-- Distinct On
+-- can't erase since pk2 is missed
+explain (costs off) select distinct on(pk1) pk1, pk2 from select_distinct_a;
+-- ok to erase
+explain (costs off) select distinct on(pk1, pk2) pk1, pk2 from select_distinct_a;
+
+
+-- Test partitioned table
+create table dist_p (a int, b int not null, c int not null, d int) partition by  range (b);
+
+create table dist_p0 partition of dist_p for values from (1) to (10);
+create table dist_p1 partition of dist_p for values from (11) to (20);
+
+-- The combines UKs in target list
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.maxc from select_distinct_a t1, (select b, max(c) as maxc from dist_p group by b) t2;
+
+-- no uk for t2, so distinct is needed.
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- create unqiue index on dist_p
+create unique index dist_p_uk_b_c on dist_p(b, c);
+
+-- (t2.b, t2.c) should be the UK
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+drop index dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+create unique index dist_p0_uk_bc on dist_p0(b, c);
+-- not ok, since dist_p1 no have exact same index
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+create unique index dist_p1_uk_bc on dist_p1(b, c);
+-- OK now
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.b, t2.c from select_distinct_a t1, dist_p t2;
+
+-- uk is same on all child tables, however it doesn't include the partkey, so not ok as well.
+create unique index dist_p0_uk_c on dist_p0(c);
+create unique index dist_p1_uk_c on dist_p1(c);
+explain (costs off) select distinct t1.pk1, t1.pk2,  t2.c from select_distinct_a t1, dist_p t2;
+
+drop table dist_p;
+
+-- test some view.
+create view distinct_v1 as select distinct uk1, uk2 from select_distinct_a where uk2 is not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) select * from distinct_v1;
+
+alter table select_distinct_a alter column uk1 set not null;
+
+-- test generic plan
+prepare pt as select * from distinct_v1;
+explain (costs off)  execute pt;
+alter table select_distinct_a alter column uk1 drop not null;
+explain (costs off) execute pt;
+
+drop view distinct_v1;
+drop table select_distinct_a;
+drop table select_distinct_b;
-- 
2.21.0

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Andy Fan (#4)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sun, 29 Mar 2020 at 20:50, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Some other changes made in the new patch:
1. Fixed bug for UniqueKey calculation for OUTER join.
2. Fixed some typo error in comments.
3. Renamed the field "grantee" as "guarantee".

I've had a look over this patch. Thank for you doing further work on it.

I've noted down the following during my read of the code:

1. There seem to be some cases where joins are no longer being
detected as unique. This is evident in postgres_fdw.out. We shouldn't
be regressing any of these cases.

2. The following change does not seem like it should be part of this
patch. I understand you perhaps have done as you think it will
improve the performance of checking if an expression is in a list of
expressions.

- COMPARE_SCALAR_FIELD(varno);
+ /* Compare varattno first since it has higher selectivity than varno */
  COMPARE_SCALAR_FIELD(varattno);
+ COMPARE_SCALAR_FIELD(varno);

If you think that is true, then please do it as a separate effort and
provide benchmarks with your findings.

3. list_all_members_in. I think this would be better named as
list_is_subset. Please follow the lead of bms_is_subset().
Additionally, you should Assert that IsPointerList is true as there's
nothing else to indicate that it can't be used for an int or Oid list.

4. guarantee is not a very good name for the field in UniqueKey.
Maybe something like is_not_null?

5. I think you should be performing a bms_del_member during join
removal rather than removing this Assert()

- Assert(bms_equal(rel->relids, root->all_baserels));

FWIW, it's far from perfect that you've needed to delay the left join
removal, but I do understand why you've done it. It's also far from
perfect that you're including removed relations in the
total_table_pages calculation. c6e4133fae1 took some measures to
improve this calculation and this is making it worse again.

6. Can you explain why you moved the populate_baserel_uniquekeys()
call out of set_plain_rel_size()?

7. I don't think the removal of rel_supports_distinctness() is
warranted. Is it not ok to check if the relation has any uniquekeys?
It's possible, particularly in join_is_removable that this can save
quite a large amount of effort.

8. Your spelling of unique is incorrect in many places:

src/backend/nodes/makefuncs.c: * makeUnqiueKey
src/backend/optimizer/path/uniquekeys.c:static List
*initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
src/backend/optimizer/path/uniquekeys.c: * check if combination of
unqiuekeys from both side is still useful for us,
src/backend/optimizer/path/uniquekeys.c: outerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
src/backend/optimizer/path/uniquekeys.c: innerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
src/backend/optimizer/path/uniquekeys.c: * we need to convert the
UnqiueKey from sub_final_rel to currel via the positions info in
src/backend/optimizer/path/uniquekeys.c: ctx->pos =
pos; /* the position in current targetlist, will be used to set
UnqiueKey */
src/backend/optimizer/path/uniquekeys.c: * Check if Unqiue key of the
innerrel is valid after join. innerrel's UniqueKey
src/backend/optimizer/path/uniquekeys.c: * initililze_unqiuecontext_for_joinrel
src/backend/optimizer/path/uniquekeys.c: * all the unqiuekeys which
are not possible to use later
src/backend/optimizer/path/uniquekeys.c:initililze_unqiuecontext_for_joinrel(RelOptInfo
*joinrel, RelOptInfo *inputrel)
src/backend/optimizer/plan/analyzejoins.c: /*
This UnqiueKey is what we want */
src/backend/optimizer/plan/planner.c: /* If we the result if unqiue
already, we just return the input_rel directly */
src/include/nodes/pathnodes.h: * exprs is a list of exprs which is
unqiue on current RelOptInfo.
src/test/regress/expected/join.out:-- XXXX: since b.id is unqiue now
so the group by cluase is erased, so
src/test/regress/expected/select_distinct.out:-- create unqiue index on dist_p
src/test/regress/expected/select_distinct.out:-- we also support
create unqiue index on each child tables
src/test/regress/sql/join.sql:-- XXXX: since b.id is unqiue now so the
group by cluase is erased, so
src/test/regress/sql/select_distinct.sql:-- create unqiue index on dist_p
src/test/regress/sql/select_distinct.sql:-- we also support create
unqiue index on each child tables

9. A few things wrong with the following fragment:

/* set the not null info now */
ListCell *lc;
foreach(lc, find_nonnullable_vars(qual))
{
Var *var = lfirst_node(Var, lc);
RelOptInfo *rel = root->simple_rel_array[var->varno];
if (var->varattno > InvalidAttrNumber)
rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
}

a. including a function call in the foreach macro is not a practise
that we really follow. It's true that the macro now assigns the 2nd
param to a variable. Previous to 1cff1b95ab6 this was not the case and
it's likely best not to leave any bad examples around that code which
might get backported might follow.
b. We generally subtract InvalidAttrNumber from varattno when
including in a Bitmapset.
c. not_null_cols is not well named. I think notnullattrs
d. not_null_cols should not be a Relids type, it should be Bitmapset.

10. add_uniquekey_for_onerow() seems pretty wasteful. Is there really
a need to add each item in the rel's targetlist to the uniquekey list?
What if we just add an empty list to the unique keys, that way if we
need to test if some expr is a superset of any uniquekey, then we'll
see it is as any set is a superset of an empty set. Likely the empty
set of uniquekeys should be the only one in the rel's uniquekey list.

11. In create_distinct_paths() the code is now calling
get_sortgrouplist_exprs() multiple times with the same input. I think
it would be better to just call it once and set the result in a local
variable.

12. The comment in the code below is not true. The List contains
Lists, of which contain UniqueKeys

List *uniquekeys; /* List of UniqueKey */

13. I'm having trouble parsing the final sentence in:

+ * can only guarantee the uniqueness without considering the null values. This
+ * field is necessary for remove_useless_join & reduce_unique_semijions since
+ * these cases don't care about the null values.

Why is the field which stores the nullability of the key required for
code that does not care about the nullability of the key?

Also please check your spelling of the word "join"

14. In the following fragment, instead of using i+1, please assign the
FormData_pg_attribute to a variable named attr and use attr->attnum.
Also, please see what I mentioned above about subtracting
InvalidAttrNumber

+ rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);

15. The tests you've changed the expected outcome of in join.out
should be updated so that the GROUP BY and DISTINCT clause is not
removed. This will allow the test to continue testing what it was
intended to test. You can do this by changing the columns in the GROUP
BY clause so that the new code does not find uniquekeys for those
columns.

16. The tests in aggregates.out are in a similar situation. There are
various tests trying to ensure that remove_useless_groupby_columns()
does what it's meant to do. You can modify these tests to add a join
which is non-unique to effectively duplicate the PK column.

17. In your select_distinct tests, can you move away from naming the
tables starting with select_distinct? It makes reading queries pretty
hard.

e.g. explain (costs off) select distinct uk1, uk2 from
select_distinct_a where uk2 is not null;

When I first glanced that, I failed to see the underscores and the
query looked invalid.

18. Check the spelling if "erased". You have it spelt as "ereased" in
a couple of locations.

19. Please pay attention to the capitalisation of SQL keywords in the
test files you've modified. I understand we're very inconsistent in
this department in general, but we do at least try not to mix
capitalisation within the same file. Basically, please upper case the
keywords in select_distinct.sql

20. In addition to the above, please try to wrap long SQL lines so
they're below 80 chars.

I'll review the patch in more detail once the above points have been addressed.

David

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David Rowley (#5)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

[ not a review, just some drive-by comments on David's comments ]

David Rowley <dgrowleyml@gmail.com> writes:

2. The following change does not seem like it should be part of this
patch. I understand you perhaps have done as you think it will
improve the performance of checking if an expression is in a list of
expressions.

- COMPARE_SCALAR_FIELD(varno);
+ /* Compare varattno first since it has higher selectivity than varno */
COMPARE_SCALAR_FIELD(varattno);
+ COMPARE_SCALAR_FIELD(varno);

If you think that is true, then please do it as a separate effort and
provide benchmarks with your findings.

By and large, I'd reject such micro-optimizations on their face.
The rule in the nodes/ support files is to list fields in the same
order they're declared in. There is no chance that it's worth
deviating from that for this.

I can believe that there'd be value in, say, comparing all
scalar fields before all non-scalar ones. But piecemeal hacks
wouldn't be the way to handle that either. In any case, I'd
prefer to implement such a plan within the infrastructure to
auto-generate these files that Andres keeps muttering about.

a. including a function call in the foreach macro is not a practise
that we really follow. It's true that the macro now assigns the 2nd
param to a variable. Previous to 1cff1b95ab6 this was not the case and
it's likely best not to leave any bad examples around that code which
might get backported might follow.

No, I think you're misremembering. foreach's second arg is
single-evaluation in all branches. There were some preliminary
versions of 1cff1b95ab6 in which it would not have been, but that
was sufficiently dangerous that I found a way to get rid of it.

b. We generally subtract InvalidAttrNumber from varattno when
including in a Bitmapset.

ITYM FirstLowInvalidHeapAttributeNumber, but yeah. Otherwise
the code fails on system columns, and there's seldom a good
reason to risk that.

regards, tom lane

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: David Rowley (#5)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Thanks David for your time, I will acknowledge every item you mentioned
with the updated patch. Now I just talk about part of them.

1. There seem to be some cases where joins are no longer being
detected as unique. This is evident in postgres_fdw.out. We shouldn't
be regressing any of these cases.

You are correct, the issue here is I didn't distinguish the one_row case
in UniqueKey struct. Actually when a outer relation is join with a relation
which has only one row, there must be at most one row match the outer join.
The only-one-row case in postgres_fdw.out come from aggregation
call. I will added one more field "bool onerow" in UniqueKey struct. and
also try your optimization suggestion for the onerow UniqueKey.

2. The following change does not seem like it should be part of this
patch. I understand you perhaps have done as you think it will
improve the performance of checking if an expression is in a list of
expressions.
- COMPARE_SCALAR_FIELD(varno);
+ /* Compare varattno first since it has higher selectivity than varno */
COMPARE_SCALAR_FIELD(varattno);
+ COMPARE_SCALAR_FIELD(varno);
I did have a hesitation when I make this changes. so I'd rollback this

change
at the following patch.

If you think that is true, then please do it as a separate effort and
provide benchmarks with your findings.

3. list_all_members_in. I think this would be better named as
list_is_subset. Please follow the lead of bms_is_subset().
Additionally, you should Assert that IsPointerList is true as there's
nothing else to indicate that it can't be used for an int or Oid list.

4. guarantee is not a very good name for the field in UniqueKey.
Maybe something like is_not_null?

5. I think you should be performing a bms_del_member during join
removal rather than removing this Assert()

- Assert(bms_equal(rel->relids, root->all_baserels));

FWIW, it's far from perfect that you've needed to delay the left join
removal, but I do understand why you've done it. It's also far from
perfect that you're including removed relations in the
total_table_pages calculation. c6e4133fae1 took some measures to
improve this calculation and this is making it worse again.

Since the removed relation depends on the UniqueKey which has to be

calculated after total_table_pages calculation in current code, so that's
something I must do. But if the relation is not removable, there is no
waste
at all. If it is removable, such gain will much higher than the loss.
I'm
not sure this should be a concern.

Actually looks the current remove_useless_join has some limits which can't
remove a joinrel, I still didn't figure out why. In the past we have some
limited
ability to detect the unqiueness after join, so that's would be ok. Since
we have
such ability now, this may be another opportunity to improve the
join_is_removable
function, but I'd not like put such thing in this patch.

Since you said "far from perfect" twice for this point and I only get one
reason (we
may plan a node which we removed later), do I miss the other one?

6. Can you explain why you moved the populate_baserel_uniquekeys()

call out of set_plain_rel_size()?

This is to be consistent with populate_partitionedrel_uniquekeys, which

is set at set_append_rel_pathlist.

7. I don't think the removal of rel_supports_distinctness() is
warranted. Is it not ok to check if the relation has any uniquekeys?

I think this is a good suggestion. I will follow that.

8. Your spelling of unique is incorrect in many places:

src/backend/nodes/makefuncs.c: * makeUnqiueKey
src/backend/optimizer/path/uniquekeys.c:static List
*initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
src/backend/optimizer/path/uniquekeys.c: * check if combination of
unqiuekeys from both side is still useful for us,
src/backend/optimizer/path/uniquekeys.c: outerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
src/backend/optimizer/path/uniquekeys.c: innerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
src/backend/optimizer/path/uniquekeys.c: * we need to convert the
UnqiueKey from sub_final_rel to currel via the positions info in
src/backend/optimizer/path/uniquekeys.c: ctx->pos =
pos; /* the position in current targetlist, will be used to set
UnqiueKey */
src/backend/optimizer/path/uniquekeys.c: * Check if Unqiue key of the
innerrel is valid after join. innerrel's UniqueKey
src/backend/optimizer/path/uniquekeys.c: *
initililze_unqiuecontext_for_joinrel
src/backend/optimizer/path/uniquekeys.c: * all the unqiuekeys which
are not possible to use later

src/backend/optimizer/path/uniquekeys.c:initililze_unqiuecontext_for_joinrel(RelOptInfo
*joinrel, RelOptInfo *inputrel)
src/backend/optimizer/plan/analyzejoins.c: /*
This UnqiueKey is what we want */
src/backend/optimizer/plan/planner.c: /* If we the result if unqiue
already, we just return the input_rel directly */
src/include/nodes/pathnodes.h: * exprs is a list of exprs which is
unqiue on current RelOptInfo.
src/test/regress/expected/join.out:-- XXXX: since b.id is unqiue now
so the group by cluase is erased, so
src/test/regress/expected/select_distinct.out:-- create unqiue index on
dist_p
src/test/regress/expected/select_distinct.out:-- we also support
create unqiue index on each child tables
src/test/regress/sql/join.sql:-- XXXX: since b.id is unqiue now so the
group by cluase is erased, so
src/test/regress/sql/select_distinct.sql:-- create unqiue index on dist_p
src/test/regress/sql/select_distinct.sql:-- we also support create
unqiue index on each child tables

9. A few things wrong with the following fragment:

/* set the not null info now */
ListCell *lc;
foreach(lc, find_nonnullable_vars(qual))
{
Var *var = lfirst_node(Var, lc);
RelOptInfo *rel = root->simple_rel_array[var->varno];
if (var->varattno > InvalidAttrNumber)
rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
}

a. including a function call in the foreach macro is not a practise
that we really follow. It's true that the macro now assigns the 2nd
param to a variable. Previous to 1cff1b95ab6 this was not the case and
it's likely best not to leave any bad examples around that code which
might get backported might follow.
b. We generally subtract InvalidAttrNumber from varattno when
including in a Bitmapset.
c. not_null_cols is not well named. I think notnullattrs
d. not_null_cols should not be a Relids type, it should be Bitmapset.

If it is a Bitmapset, we have to pass it with "&" usually. is it our

practice?

10. add_uniquekey_for_onerow() seems pretty wasteful. Is there really
a need to add each item in the rel's targetlist to the uniquekey list?
What if we just add an empty list to the unique keys, that way if we
need to test if some expr is a superset of any uniquekey, then we'll
see it is as any set is a superset of an empty set. Likely the empty
set of uniquekeys should be the only one in the rel's uniquekey list.

11. In create_distinct_paths() the code is now calling
get_sortgrouplist_exprs() multiple times with the same input. I think
it would be better to just call it once and set the result in a local
variable.

12. The comment in the code below is not true. The List contains
Lists, of which contain UniqueKeys

List *uniquekeys; /* List of UniqueKey */

It is a list of UniqueKey, the UniqueKey can have a list of exprs.

13. I'm having trouble parsing the final sentence in:
+ * can only guarantee the uniqueness without considering the null values.
This
+ * field is necessary for remove_useless_join & reduce_unique_semijions
since
+ * these cases don't care about the null values.
Why is the field which stores the nullability of the key required for
code that does not care about the nullability of the key?

The guarantee is introduced to for the following cases:

create table t1 (a int primary key, b int);
create table t2 (a int primary key, b int);
select .. from t1, (select b from t2 group by t2) t2 ..;

-- b is nullable. so t2(b) can't be a normal UniqueKey (which means b may
have some
duplicated rows)
create unique index t2_uk_b on t2(b);

-- the left join still can be removed since t2.b is a unique index and the
nullable
doesn't matter here.
select t1.* from t1 left join t2 on (t1.b = t2.b);

do you think we have can do some optimization in this case? I don't
understand
your question well.

15. The tests you've changed the expected outcome of in join.out
should be updated so that the GROUP BY and DISTINCT clause is not
removed. This will allow the test to continue testing what it was
intended to test. You can do this by changing the columns in the GROUP
BY clause so that the new code does not find uniquekeys for those
columns.

Thanks for your explanation, very impressive!

16. The tests in aggregates.out are in a similar situation. There are
various tests trying to ensure that remove_useless_groupby_columns()
does what it's meant to do. You can modify these tests to add a join
which is non-unique to effectively duplicate the PK column.

17. In your select_distinct tests, can you move away from naming the
tables starting with select_distinct? It makes reading queries pretty
hard.

e.g. explain (costs off) select distinct uk1, uk2 from
select_distinct_a where uk2 is not null;

When I first glanced that, I failed to see the underscores and the
query looked invalid.

18. Check the spelling if "erased". You have it spelt as "ereased" in
a couple of locations.

OK, I just installed a spell check plugin for my editor, hope it will
catch such
errors next time.

19. Please pay attention to the capitalisation of SQL keywords in the

Show quoted text

test files you've modified. I understand we're very inconsistent in
this department in general, but we do at least try not to mix
capitalisation within the same file. Basically, please upper case the
keywords in select_distinct.sql

20. In addition to the above, please try to wrap long SQL lines so
they're below 80 chars.

I'll review the patch in more detail once the above points have been
addressed.

David

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Andy Fan (#7)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, 1 Apr 2020 at 13:45, Andy Fan <zhihui.fan1213@gmail.com> wrote:

5. I think you should be performing a bms_del_member during join
removal rather than removing this Assert()

- Assert(bms_equal(rel->relids, root->all_baserels));

FWIW, it's far from perfect that you've needed to delay the left join
removal, but I do understand why you've done it. It's also far from
perfect that you're including removed relations in the
total_table_pages calculation. c6e4133fae1 took some measures to
improve this calculation and this is making it worse again.

Since the removed relation depends on the UniqueKey which has to be
calculated after total_table_pages calculation in current code, so that's
something I must do. But if the relation is not removable, there is no waste
at all. If it is removable, such gain will much higher than the loss. I'm
not sure this should be a concern.

The reason join removals was done so early in planning before was to
save the planner from having to do additional work for relations which
were going to be removed later. For example, building path lists.

Actually looks the current remove_useless_join has some limits which can't
remove a joinrel, I still didn't figure out why. In the past we have some limited
ability to detect the unqiueness after join, so that's would be ok. Since we have
such ability now, this may be another opportunity to improve the join_is_removable
function, but I'd not like put such thing in this patch.

Yeah, there's certainly more left join shapes that we could remove.
e.g when the left join relation is not a singleton rel. We shouldn't
do anything to purposefully block additional join removals as a result
of adding UniqueKeys, but likely shouldn't go to any trouble to make
additional ones work. That can be done later.

Since you said "far from perfect" twice for this point and I only get one reason (we
may plan a node which we removed later), do I miss the other one?

a) additional planning work by not removing the join sooner. b) wrong
total page calculation.

In theory b) could be fixed by subtracting the removed join rels pages
after we remove it, but unfortunately, there's no point since we've
built the paths by that time already and we really only use the value
to determine how much IO is going to be random vs sequential, which is
determined during set_base_rel_pathlists()

d. not_null_cols should not be a Relids type, it should be Bitmapset.

If it is a Bitmapset, we have to pass it with "&" usually. is it our practice?

Well, a Bitmapset pointer. Relids is saved for range table indexes.
Storing anything else in there is likely to lead to confusion.

12. The comment in the code below is not true. The List contains
Lists, of which contain UniqueKeys

List *uniquekeys; /* List of UniqueKey */

It is a list of UniqueKey, the UniqueKey can have a list of exprs.

Hmm, so this is what I called a UniqueKeySet in the original patch.
I'm a bit divided by that change. With PathKeys, technically you can
make use of a Path with a given set of PathKeys if you only require
some leading subset of those keys. That's not the case for
UniqueKeys, it's all or nothing, so perhaps having the singular name
is better than the plural name I gave it. However, I'm not certain.

(Really PathKey does not seem like a great name in the first place
since it has nothing to do with keys)

13. I'm having trouble parsing the final sentence in:
+ * can only guarantee the uniqueness without considering the null values. This
+ * field is necessary for remove_useless_join & reduce_unique_semijions since
+ * these cases don't care about the null values.
Why is the field which stores the nullability of the key required for
code that does not care about the nullability of the key?
The guarantee is introduced to for the following cases:

create table t1 (a int primary key, b int);
create table t2 (a int primary key, b int);
select .. from t1, (select b from t2 group by t2) t2 ..;

-- b is nullable. so t2(b) can't be a normal UniqueKey (which means b may have some
duplicated rows)
create unique index t2_uk_b on t2(b);

-- the left join still can be removed since t2.b is a unique index and the nullable
doesn't matter here.
select t1.* from t1 left join t2 on (t1.b = t2.b);

do you think we have can do some optimization in this case? I don't understand
your question well.

OK, so by "don't care", you mean, don't duplicate NULL values. I
assumed you had meant that it does not matter either way, as in: don't
mind if there are NULL values or not. It might be best to have a go at
changing the wording to be more explicit to what you mean there.

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: David Rowley (#5)

1 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

The updated patch should fixed all the issues. See the comments below for
more
information.

On Tue, Mar 31, 2020 at 9:44 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, 29 Mar 2020 at 20:50, Andy Fan <zhihui.fan1213@gmail.com> wrote:

Some other changes made in the new patch:
1. Fixed bug for UniqueKey calculation for OUTER join.
2. Fixed some typo error in comments.
3. Renamed the field "grantee" as "guarantee".

I've had a look over this patch. Thank for you doing further work on it.

I've noted down the following during my read of the code:

1. There seem to be some cases where joins are no longer being
detected as unique. This is evident in postgres_fdw.out. We shouldn't
be regressing any of these cases.

The issue here is I didn't distinguish the one_row case in UniqueKey
struct.
Actually when a outer relation is join with a relation which has only one
row,
there must be at most one row match the outer join.The only-one-row case in
postgres_fdw.out come from aggregation call.

Added a new field "onerow" in UniqueKey struct. and optimize the onerow
UniqueKey to not record every exprs. See add_uniquekey_for_onerow
and relation_is_onerow.

2. The following change does not seem like it should be part of this
patch. I understand you perhaps have done as you think it will
improve the performance of checking if an expression is in a list of
expressions.
- COMPARE_SCALAR_FIELD(varno);
+ /* Compare varattno first since it has higher selectivity than varno */
COMPARE_SCALAR_FIELD(varattno);
+ COMPARE_SCALAR_FIELD(varno);
If you think that is true, then please do it as a separate effort and
provide benchmarks with your findings.

Rollbacked.

3. list_all_members_in. I think this would be better named as
list_is_subset. Please follow the lead of bms_is_subset().
Additionally, you should Assert that IsPointerList is true as there's
nothing else to indicate that it can't be used for an int or Oid list.

Done

4. guarantee is not a very good name for the field in UniqueKey.
Maybe something like is_not_null?

I tried is_not_null, but when it is is_not_null equals false, it is a

double
negation, and not feel good for me. At last, I used multi_nullvals to show
the UniqueKey may yield multi null values so the uniqueness is not
guaranteed.

5. I think you should be performing a bms_del_member during join
removal rather than removing this Assert()

- Assert(bms_equal(rel->relids, root->all_baserels));

Done

FWIW, it's far from perfect that you've needed to delay the left join
removal, but I do understand why you've done it. It's also far from
perfect that you're including removed relations in the
total_table_pages calculation. c6e4133fae1 took some measures to
improve this calculation and this is making it worse again.

6. Can you explain why you moved the populate_baserel_uniquekeys()
call out of set_plain_rel_size()?

7. I don't think the removal of rel_supports_distinctness() is
warranted. Is it not ok to check if the relation has any uniquekeys?
It's possible, particularly in join_is_removable that this can save
quite a large amount of effort.

Done

8. Your spelling of unique is incorrect in many places:

src/backend/nodes/makefuncs.c: * makeUnqiueKey
src/backend/optimizer/path/uniquekeys.c:static List
*initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
src/backend/optimizer/path/uniquekeys.c: * check if combination of
unqiuekeys from both side is still useful for us,
src/backend/optimizer/path/uniquekeys.c: outerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
src/backend/optimizer/path/uniquekeys.c: innerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
src/backend/optimizer/path/uniquekeys.c: * we need to convert the
UnqiueKey from sub_final_rel to currel via the positions info in
src/backend/optimizer/path/uniquekeys.c: ctx->pos =
pos; /* the position in current targetlist, will be used to set
UnqiueKey */
src/backend/optimizer/path/uniquekeys.c: * Check if Unqiue key of the
innerrel is valid after join. innerrel's UniqueKey
src/backend/optimizer/path/uniquekeys.c: *
initililze_unqiuecontext_for_joinrel
src/backend/optimizer/path/uniquekeys.c: * all the unqiuekeys which
are not possible to use later

src/backend/optimizer/path/uniquekeys.c:initililze_unqiuecontext_for_joinrel(RelOptInfo
*joinrel, RelOptInfo *inputrel)
src/backend/optimizer/plan/analyzejoins.c: /*
This UnqiueKey is what we want */
src/backend/optimizer/plan/planner.c: /* If we the result if unqiue
already, we just return the input_rel directly */
src/include/nodes/pathnodes.h: * exprs is a list of exprs which is
unqiue on current RelOptInfo.
src/test/regress/expected/join.out:-- XXXX: since b.id is unqiue now
so the group by cluase is erased, so
src/test/regress/expected/select_distinct.out:-- create unqiue index on
dist_p
src/test/regress/expected/select_distinct.out:-- we also support
create unqiue index on each child tables
src/test/regress/sql/join.sql:-- XXXX: since b.id is unqiue now so the
group by cluase is erased, so
src/test/regress/sql/select_distinct.sql:-- create unqiue index on dist_p
src/test/regress/sql/select_distinct.sql:-- we also support create
unqiue index on each child tables

9. A few things wrong with the following fragment:

/* set the not null info now */
ListCell *lc;
foreach(lc, find_nonnullable_vars(qual))
{
Var *var = lfirst_node(Var, lc);
RelOptInfo *rel = root->simple_rel_array[var->varno];
if (var->varattno > InvalidAttrNumber)
rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
}

a. including a function call in the foreach macro is not a practise
that we really follow. It's true that the macro now assigns the 2nd
param to a variable. Previous to 1cff1b95ab6 this was not the case and
it's likely best not to leave any bad examples around that code which
might get backported might follow.
b. We generally subtract InvalidAttrNumber from varattno when
including in a Bitmapset.
c. not_null_cols is not well named. I think notnullattrs
d. not_null_cols should not be a Relids type, it should be Bitmapset.

Above 2 Done

10. add_uniquekey_for_onerow() seems pretty wasteful. Is there really
a need to add each item in the rel's targetlist to the uniquekey list?
What if we just add an empty list to the unique keys, that way if we
need to test if some expr is a superset of any uniquekey, then we'll
see it is as any set is a superset of an empty set. Likely the empty
set of uniquekeys should be the only one in the rel's uniquekey list.

Now I use a single UniqueKey to show this situation. See
add_uniquekey_for_onerow and relation_is_onerow.

11. In create_distinct_paths() the code is now calling
get_sortgrouplist_exprs() multiple times with the same input. I think
it would be better to just call it once and set the result in a local
variable.

Done

12. The comment in the code below is not true. The List contains
Lists, of which contain UniqueKeys

List *uniquekeys; /* List of UniqueKey */

13. I'm having trouble parsing the final sentence in:
+ * can only guarantee the uniqueness without considering the null values.
This
+ * field is necessary for remove_useless_join & reduce_unique_semijions
since
+ * these cases don't care about the null values.
Why is the field which stores the nullability of the key required for
code that does not care about the nullability of the key?

Also please check your spelling of the word "join"

Actually I didn't find the spell error for "join"..

14. In the following fragment, instead of using i+1, please assign the
FormData_pg_attribute to a variable named attr and use attr->attnum.
Also, please see what I mentioned above about subtracting
InvalidAttrNumber

+ rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);

Done

15. The tests you've changed the expected outcome of in join.out
should be updated so that the GROUP BY and DISTINCT clause is not
removed. This will allow the test to continue testing what it was
intended to test. You can do this by changing the columns in the GROUP
BY clause so that the new code does not find uniquekeys for those
columns.

Done

16. The tests in aggregates.out are in a similar situation. There are
various tests trying to ensure that remove_useless_groupby_columns()
does what it's meant to do. You can modify these tests to add a join
which is non-unique to effectively duplicate the PK column.

There are some exceptions at this part.
1. The test for remove_useless_groupby_columns has some overlap
with our current erasing group node logic, like the test for a single
relation.
so I just modified 2 tests case for this purpose.
2. When I read the code in remove_useless_groupby_columns, I found a
new case for our UniqueKey.
select * from m1 where a > (select avg(b) from m2 group by *M1.A*);
where the m1.a will have var->varlevelsup > 0, how should we set the
UniqueKey
for this grouprel . I add some in-completed check at
add_uniquekey_from_sortgroups
function. but I'm not sure if we need that.
3. remove_useless_groupby_columns maintains the parse->constraintDeps
when it depends on primary key, but UniqueKey doesn't maintain such data.
since we have translation layer which should protect us from the
concurrency issue
and isolation issue. Do we need to do that as well in UniqueKey?

17. In your select_distinct tests, can you move away from naming the
tables starting with select_distinct? It makes reading queries pretty
hard.

e.g. explain (costs off) select distinct uk1, uk2 from
select_distinct_a where uk2 is not null;

When I first glanced that, I failed to see the underscores and the
query looked invalid.

18. Check the spelling if "erased". You have it spelt as "ereased" in

a couple of locations.

19. Please pay attention to the capitalisation of SQL keywords in the
test files you've modified. I understand we're very inconsistent in
this department in general, but we do at least try not to mix
capitalisation within the same file. Basically, please upper case the
keywords in select_distinct.sql

20. In addition to the above, please try to wrap long SQL lines so
they're below 80 chars.

All above 4 item Done.

I'll review the patch in more detail once the above points have been
addressed.

David

Show quoted text

Attachments:

v4-0001-Maintain-UniqueKey-at-each-RelOptInfo-this-inform.patchapplication/octet-stream; name=v4-0001-Maintain-UniqueKey-at-each-RelOptInfo-this-inform.patchDownload

From aa39394a639efbffcb80c7da632e75e92feb2a9a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Fri, 3 Apr 2020 09:55:22 +0800
Subject: [PATCH v4]  Maintain UniqueKey at each RelOptInfo, this information 
 can be used

to erasing distinct/group path if we are sure the result is unique
already. And rel_is_unique_for is modified to use UniqueKey to
detect more cases.
---
 .../postgres_fdw/expected/postgres_fdw.out    |   32 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    1 +
 src/backend/nodes/list.c                      |   31 +
 src/backend/nodes/makefuncs.c                 |   16 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1110 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  143 +--
 src/backend/optimizer/plan/initsplan.c        |   10 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   37 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |   10 +
 src/include/nodes/makefuncs.h                 |    3 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   32 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   34 +
 src/test/regress/expected/aggregates.out      |   73 +-
 src/test/regress/expected/join.out            |   55 +-
 src/test/regress/expected/select_distinct.out |  335 +++++
 src/test/regress/sql/aggregates.sql           |    6 +-
 src/test/regress/sql/join.sql                 |   12 +-
 src/test/regress/sql/select_distinct.sql      |  118 ++
 25 files changed, 1874 insertions(+), 232 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..88441568b7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -2902,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index bd0c58cd81..889eb2940e 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -688,6 +688,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return true iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index e8cdc90c31..e79757cb43 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -809,3 +809,19 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool multi_nullvals, bool onerow)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->multi_nullvals = multi_nullvals;
+	ukey->onerow = onerow;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 905bbe77d8..d6f1a45f06 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
@@ -786,6 +798,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1291,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2366,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a21c295b99..c6799aa48c 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -920,6 +920,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..4cba7c9698
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1110 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/optimizer.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel, if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+		/*
+		 * Set UniqueKey on member rel is useless, we have to recompute it at
+		 * upper level, see populate_partitionedrel_uniquekeys for reference
+		 */
+		return;
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+
+		if (ind->indexprs != NIL)
+			/* We can't guarantee if an expression returns a NULL value, so ignore it */
+			continue;
+		matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unique index doesn't contain partkey, then it is unique
+		 * on this partition only, so it is useless for us.
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exactly same on all childrels,
+	 * Set the UniqueIndex just like it is non-partition table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set */
+	if (parse->groupClause)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggregation but without a group by, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+void
+populate_unionrel_uniquiekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true, false));
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if it's
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed from false to true
+ * for some outer join cases.
+
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well. multi_nullvals is set to true if either
+ * side have multi_nullvals equals true.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	outer_is_onerow, inner_is_onerow;
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+    /* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	outer_is_onerow = relation_is_onerow(outerrel);
+	inner_is_onerow = relation_is_onerow(innerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel, outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel, innerrel);
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				/* The UniqueKey on baserel is not useful on the joinrel */
+				ctx->useful = false;
+				continue;
+			}
+			if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) && !ctx->uniquekey->multi_nullvals)
+			{
+				/* Change the multi_nullvals to true at this case */
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															true,
+															false));
+			}
+			else if (inner_is_onerow)
+			{
+				/* Since rows in innerrel can't be duplicated AND if innerrel is onerow,
+				 * the join result will be onerow also as well. Note: onerow implies
+				 * multi_nullvals = false.
+				 */
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/* NULL values in outer rel can be duplicated under JOIN_FULL only */
+			if (jointype == JOIN_FULL && ctx->uniquekey->multi_nullvals)
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															true,
+															false));
+
+			}
+			else if (outer_is_onerow)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	/* The combination of the UniqueKey from both sides is unique as well regardless of
+	 * join type, But no bother to add it if its subset has been added already.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			bool multi_nullvals;
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			exprs = list_copy(ctx1->uniquekey->exprs);
+			colnos = list_copy(ctx1->uniquekey->positions);
+			exprs = list_concat(exprs, ctx2->uniquekey->exprs);
+			colnos = list_concat(colnos, ctx2->uniquekey->positions);
+
+			multi_nullvals = ctx1->uniquekey->multi_nullvals || ctx2->uniquekey->multi_nullvals;
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														colnos,
+														multi_nullvals,
+														/* All onerow cases has been handled above */
+														false));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid multi scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unique as well, we can't
+	 * guarantee others
+	 */
+	Var *var;
+
+	/* The position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the RelOptInfo in current level, sub_final_rel is get from the fetch_upper_rel
+ * we need to convert the UniqueKey from sub_final_rel to currel via the positions info in
+ * UniqueKey. Example:
+ *
+ * select t2.colx from t1, (select max(y), colx from t3 group by colx) t2 where ..
+ * The UniqueKey in sub_final_rel is Var(varno=1, varattrno=N), position=2.
+ * the UniqueKey in currel will be Var(varno=2, varattrno=2), position= 1
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+	/*
+	 * Calculate max_colno in subquery. In fact we can check this with
+	 * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+	 * is not set on UPPERREL_FINAL relation, so do it this way
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext) * (max_colno_subq + 1));
+
+	/*
+	 * Create an array for each expr in currel->reltarget->exprs, the array index
+	 * is the colno in subquery, so that we can get the expr quickly given a colno_subq
+	 */
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UniqueKey */
+		ctx->var = var;
+	}
+
+	/* Convert the UniqueKey from sub_final_rel to currel */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* The column is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   colnos,
+													   ukey->multi_nullvals,
+													   ukey->onerow));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's uniquekey mergeop outrerel's uniquekey exists
+ * in clause_list.
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	if (relation_is_onerow(innerrel))
+		return true;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * beginning
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	if (ukey->onerow)
+	{
+		/* Some helpful tiny check for UniqueKey */
+
+		/* 1. We will only store one UniqueKey for this rel */
+		Assert(list_length(rel->uniquekeys) == 1);
+		/* 2. multi_nullvals must be false */
+		Assert(!ukey->multi_nullvals);
+		/* 3. exprs & positions must be NIL */
+		Assert(ukey->exprs == NIL);
+		Assert(ukey->positions == NIL);
+	}
+	return ukey->onerow;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	pos;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	multi_nullvals = false;
+
+	/* Fast path. Check if the indexed columns are used in this relation
+	 * If not, return fast.
+	 */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		int attno = unique_index->indexkeys[pos] - rel->min_attr;
+		if (bms_is_empty(rel->attr_needed[attno]))
+			return;
+	}
+
+	/* We still need to check the rel->reltarget->exprs to get the exprs and positions */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (match_index_to_operand((Node *)lfirst(lc), pos, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, pos+1);
+				if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+								   rel->notnullattrs))
+					multi_nullvals = true;
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, multi_nullvals, false));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique. There is no need to create UniqueKey for every expr, we just set
+ * UniqueKey->onerow to true is OK
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, /* No need to set exprs */
+											   NIL, /* No need to set positions */
+											   false, /* onerow can't have multi_nullvals */
+											   true));
+
+}
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the uniquekeys which are not possible to use later
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrict info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known global_unique_indexes to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+	
+	/* XXX: If there are some vars which is not in current levelsup, the semantic is 
+	 * imprecise, should we avoid it? levelsup = 1 is just a demo, maybe we need to 
+	 * check every level other than 0, if so, we need write another pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+	
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											colnos,
+											false, /* sortgroupclause can't be multi_nullvals */
+											relation_is_onerow(rel) /* should be always false */
+								  ));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..f8491e2bcf 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -574,49 +576,11 @@ reduce_unique_semijoins(PlannerInfo *root)
  * rel_supports_distinctness
  *		Could the relation possibly be proven distinct on some set of columns?
  *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
  */
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +604,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +886,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..ca6303915c 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b65abf6046..4fff019b52 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2386,6 +2386,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3810,6 +3812,23 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggreation functions
+		 */
+		if (groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL &&
+			relation_has_uniquekeys_for(root, input_rel, groupExprs))
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -3898,6 +3917,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4615,7 +4636,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4734,6 +4755,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4771,10 +4798,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
@@ -4912,7 +4935,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5060,6 +5083,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..8aa4d24cb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index d82fc5ab8b..2373e39acf 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -460,6 +461,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..fd050249d2 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions,
+								bool multi_nullvals, bool onerow);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 8a76afe8cc..679cc4cc9c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -261,6 +261,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ceb809644..d777d896d2 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -687,6 +687,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -706,6 +708,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1017,6 +1020,35 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used when we translate the UniqueKey
+ * in subquery.
+ * multi_nullvals: true means multi null values may exists in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ * onerow means the related relation return 1 row only. Like filter with unique
+ * index, aggregate without group node, join 2 1-row relations. An optimization
+ * is if the onerow is set to true, we will set not record every expr as a UniqueKey,
+ * we store exprs as a NIL.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		multi_nullvals;
+	bool		onerow;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 9ab73bd20c..534cf04616 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,4 +240,38 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+extern bool relation_is_onerow(RelOptInfo *rel);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..42bd180895 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1109,29 +1102,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
@@ -1161,12 +1152,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1168,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..54d987405a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4415,36 +4415,35 @@ select d.* from d left join (select distinct * from b) s
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..227bc27af4 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,338 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- Distinct is ok even with NOT clause-list both UNIQUE keys shown in targetlist
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because OF group BY
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+DROP TABLE dist_p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 9480abd577..3446c3e9fd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -394,13 +394,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..d52a6052de 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1516,13 +1516,13 @@ select d.* from d left join (select distinct * from b) s
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..72716217c0 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,121 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- Distinct is ok even with NOT clause-list both UNIQUE keys shown in targetlist
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because OF group BY
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
-- 
2.21.0

#10

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Andy Fan (#9)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, 3 Apr 2020 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

The updated patch should fixed all the issues. See the comments below for more
information.

On Tue, Mar 31, 2020 at 9:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
+ * can only guarantee the uniqueness without considering the null values. This
+ * field is necessary for remove_useless_join & reduce_unique_semijions since
+ * these cases don't care about the null values.
Why is the field which stores the nullability of the key required for
code that does not care about the nullability of the key?

Also please check your spelling of the word "join"
Actually I didn't find the spell error for "join"..

It was in reduce_unique_semijions. That should be
reduce_unique_semijoins. I see you fixed it in the patch though.

3. remove_useless_groupby_columns maintains the parse->constraintDeps
when it depends on primary key, but UniqueKey doesn't maintain such data.
since we have translation layer which should protect us from the concurrency issue
and isolation issue. Do we need to do that as well in UniqueKey?

I'm pretty sure that code is pretty bogus in
remove_useless_groupby_columns(). It perhaps was just copied from
check_functional_grouping(), where it is required. Looks like the
(ahem) author of d4c3a156c got that wrong... :-(

The reason check_functional_grouping() needs it is for things like
creating a view with a GROUP BY clause that has a column in the SELECT
list that is functionally dependant on the GROUP BY columns. e.g:

create table z (a int primary key, b int);
create view view_z as select a,b from z group by a;
alter table z drop constraint z_pkey;
ERROR: cannot drop constraint z_pkey on table z because other objects
depend on it
DETAIL: view view_z depends on constraint z_pkey on table z
HINT: Use DROP ... CASCADE to drop the dependent objects too.

Here that view would become invalid if the PK was dropped, so we must
record the dependency in that case. Doing so is what causes that
error message.

For just planner smarts such as LEFT JOIN removal, Unique Joins, and
all this Unique Key stuff, we really don't need to record the
dependency as if the index or constraint is dropped, then that'll
cause a relcache invalidation and we'll see the invalidation when we
attempt to execute the cached plan. That will cause the statement to
be re-planned and we'll not see the unique index when we do that.

We should probably just get rid of that code in
remove_useless_groupby_columns() to stop people getting confused about
that.

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David Rowley (#10)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley <dgrowleyml@gmail.com> writes:

For just planner smarts such as LEFT JOIN removal, Unique Joins, and
all this Unique Key stuff, we really don't need to record the
dependency as if the index or constraint is dropped, then that'll
cause a relcache invalidation and we'll see the invalidation when we
attempt to execute the cached plan. That will cause the statement to
be re-planned and we'll not see the unique index when we do that.

You need to make sure that the thing you're concerned about will actually
cause a relcache invalidation of a table in the query. But yeah, if it
will then there's not a need to have any other invalidation mechanism.

(It occurs to me BTW that we've been overly conservative about using
NOT NULL constraints in planning, because of failing to consider that.
Addition or drop of NOT NULL has to cause a change in
pg_attribute.attnotnull, which will definitely cause a relcache inval
on its table, cf rules in CacheInvalidateHeapTuple(). So we *don't*
need to have a pg_constraint entry corresponding to the NOT NULL, as
we've mistakenly supposed in some past discussions.)

regards, tom lane

#12

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Tom Lane (#11)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, 3 Apr 2020 at 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(It occurs to me BTW that we've been overly conservative about using
NOT NULL constraints in planning, because of failing to consider that.
Addition or drop of NOT NULL has to cause a change in
pg_attribute.attnotnull, which will definitely cause a relcache inval
on its table, cf rules in CacheInvalidateHeapTuple(). So we *don't*
need to have a pg_constraint entry corresponding to the NOT NULL, as
we've mistakenly supposed in some past discussions.)

Agreed for remove_useless_groupby_columns(), but we'd need it if we
wanted to detect functional dependencies in
check_functional_grouping() using unique indexes.

#13

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

In reply to: David Rowley (#12)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, Apr 3, 2020 at 12:08 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 3 Apr 2020 at 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(It occurs to me BTW that we've been overly conservative about using
NOT NULL constraints in planning, because of failing to consider that.
Addition or drop of NOT NULL has to cause a change in
pg_attribute.attnotnull, which will definitely cause a relcache inval
on its table, cf rules in CacheInvalidateHeapTuple(). So we *don't*
need to have a pg_constraint entry corresponding to the NOT NULL, as
we've mistakenly supposed in some past discussions.)

Agreed for remove_useless_groupby_columns(), but we'd need it if we
wanted to detect functional dependencies in
check_functional_grouping() using unique indexes.

Thanks for the explanation. I will add the removal in the next version of
this
patch.

Best Regards
Andy Fan

#14

David Rowley

dgrowleyml@gmail.com

almost 6 years ago

In reply to: Andy Fan (#13)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, 3 Apr 2020 at 21:56, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Fri, Apr 3, 2020 at 12:08 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 3 Apr 2020 at 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(It occurs to me BTW that we've been overly conservative about using
NOT NULL constraints in planning, because of failing to consider that.
Addition or drop of NOT NULL has to cause a change in
pg_attribute.attnotnull, which will definitely cause a relcache inval
on its table, cf rules in CacheInvalidateHeapTuple(). So we *don't*
need to have a pg_constraint entry corresponding to the NOT NULL, as
we've mistakenly supposed in some past discussions.)

Agreed for remove_useless_groupby_columns(), but we'd need it if we
wanted to detect functional dependencies in
check_functional_grouping() using unique indexes.

Thanks for the explanation. I will add the removal in the next version of this
patch.

There's no need for this patch to touch
remove_useless_groupby_columns(). Fixes for that should be considered
independently and *possibly* even backpatched.

#15

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#9)

2 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, 3 Apr 2020 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote:

All above 4 item Done.

Just to explain my view on this going forward for PG14. I do plan to
do a more thorough review of this soon. I wasn't so keen on pursuing
this for PG13 as the skip scans patch [1]https://commitfest.postgresql.org/27/1741/ needs to use the same
infrastructure this patch has added and it does not, yet.

The infrastructure (knowing the unique properties of a RelOptInfo), as
provided by the patch Andy has been working on, which is based on my
rough prototype version, I believe should be used for the skip scans
patch as well. I understand that as skip scans currently stands,
Jasper has done quite a bit of work to add the UniqueKeys, however,
this was unfortunately based on some early description of UniqueKeys
where I had thought that we could just store EquivalenceClasses. I no
longer think that's the case, and I believe the implementation that we
require is something more along the lines of Andy's latest version of
the patch. However, I've not quite stared at it long enough to be
highly confident in that.

I'd like to strike up a bit of a plan to move both Andy's work and the
Skip scans work forward for PG14.

Here are my thoughts so far:

1. Revise v4 of remove DISTINCT patch to split the patch into two pieces.

0001 should add the UniqueKey code but not any additional planner
smarts to use them (i.e remove GROUP BY / DISTINCT) elimination parts.
Join removals and Unique joins should use UniqueKeys in this patch.
0002 should add back the GROUP BY / DISTINCT smarts and add whatever
tests should be added for that and include updating existing expected
results and modifying any tests which no longer properly test what
they're meant to be testing.

I've done this with the attached patch.

2. David / Jesper to look at 0001 and build or align the existing skip
scans 0001 patch to make use of Andy's 0001 patch. This will require
tagging UniqueKeys onto Paths, not just RelOptInfos, plus a bunch of
other work.

Clearly UniqueKeys must suit both needs and since we have two
different implementations each providing some subset of the features,
then clearly we're not yet ready to move both skip scans and this
patch forward together. We need to align that and move both patches
forward together. Hopefully, the attached 0001 patch helps move that
along.

While I'm here, a quick review of Andy's v4 patch. I didn't address
any of this in the attached v5. These are only based on what I saw
when shuffling some code around. It's not an in-depth review.

1. Out of date comment in join.sql

-- join removal is not possible when the GROUP BY contains a column that is
-- not in the join condition. (Note: as of 9.6, we notice that b.id is a
-- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
-- but this happens too late for join removal in the outer plan level.)
explain (costs off)
select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
on d.a = s.d;

You've changed the GROUP BY clause so it does not include b.id, so the
Note in the comment is now misleading.

2. I think 0002 is overly restrictive in its demands that
parse->hasAggs must be false. We should be able to just use a Group
Aggregate with unsorted input when the input_rel is unique on the
GROUP BY clause. This will save on hashing and sorting. Basically
similar to what we do for when a query contains aggregates without any
GROUP BY.

3. I don't quite understand why you changed this to a right join:

 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y

Perhaps this change is left over from some previous version of the patch?

[1]: https://commitfest.postgresql.org/27/1741/

Attachments:

v5-0001-Introduce-UniqueKeys-to-determine-RelOptInfo-uniq.patchapplication/octet-stream; name=v5-0001-Introduce-UniqueKeys-to-determine-RelOptInfo-uniq.patchDownload

From 2a775e82d7b3a6aac8a90ee73179c83fd55bc7b6 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 14 Apr 2020 19:44:15 +1200
Subject: [PATCH v5 1/2] Introduce UniqueKeys to determine RelOptInfo unique
 properties

---
 .../postgres_fdw/expected/postgres_fdw.out    |    4 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    1 +
 src/backend/nodes/list.c                      |   31 +
 src/backend/nodes/makefuncs.c                 |   16 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1110 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  143 +--
 src/backend/optimizer/plan/initsplan.c        |   10 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   10 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |   10 +
 src/include/nodes/makefuncs.h                 |    3 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   32 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   34 +
 src/test/regress/expected/aggregates.out      |   10 +-
 src/test/regress/expected/join.out            |   55 +-
 src/test/regress/sql/aggregates.sql           |    6 +-
 src/test/regress/sql/join.sql                 |   12 +-
 23 files changed, 1360 insertions(+), 175 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..e941b7e538 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..aa47eaed36 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return true iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index b442b5a29e..fe2d416293 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -812,3 +812,19 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool multi_nullvals, bool onerow)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->multi_nullvals = multi_nullvals;
+	ukey->onerow = onerow;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 255f56b827..166e3bd0ba 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
@@ -786,6 +798,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1291,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2366,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4e1650994d..c0d2332caf 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..4cba7c9698
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1110 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/optimizer.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel, if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+		/*
+		 * Set UniqueKey on member rel is useless, we have to recompute it at
+		 * upper level, see populate_partitionedrel_uniquekeys for reference
+		 */
+		return;
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+
+		if (ind->indexprs != NIL)
+			/* We can't guarantee if an expression returns a NULL value, so ignore it */
+			continue;
+		matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unique index doesn't contain partkey, then it is unique
+		 * on this partition only, so it is useless for us.
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exactly same on all childrels,
+	 * Set the UniqueIndex just like it is non-partition table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set */
+	if (parse->groupClause)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggregation but without a group by, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+void
+populate_unionrel_uniquiekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true, false));
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if it's
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed from false to true
+ * for some outer join cases.
+
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well. multi_nullvals is set to true if either
+ * side have multi_nullvals equals true.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	outer_is_onerow, inner_is_onerow;
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+    /* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	outer_is_onerow = relation_is_onerow(outerrel);
+	inner_is_onerow = relation_is_onerow(innerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel, outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel, innerrel);
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				/* The UniqueKey on baserel is not useful on the joinrel */
+				ctx->useful = false;
+				continue;
+			}
+			if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) && !ctx->uniquekey->multi_nullvals)
+			{
+				/* Change the multi_nullvals to true at this case */
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															true,
+															false));
+			}
+			else if (inner_is_onerow)
+			{
+				/* Since rows in innerrel can't be duplicated AND if innerrel is onerow,
+				 * the join result will be onerow also as well. Note: onerow implies
+				 * multi_nullvals = false.
+				 */
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/* NULL values in outer rel can be duplicated under JOIN_FULL only */
+			if (jointype == JOIN_FULL && ctx->uniquekey->multi_nullvals)
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															true,
+															false));
+
+			}
+			else if (outer_is_onerow)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	/* The combination of the UniqueKey from both sides is unique as well regardless of
+	 * join type, But no bother to add it if its subset has been added already.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			bool multi_nullvals;
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			exprs = list_copy(ctx1->uniquekey->exprs);
+			colnos = list_copy(ctx1->uniquekey->positions);
+			exprs = list_concat(exprs, ctx2->uniquekey->exprs);
+			colnos = list_concat(colnos, ctx2->uniquekey->positions);
+
+			multi_nullvals = ctx1->uniquekey->multi_nullvals || ctx2->uniquekey->multi_nullvals;
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														colnos,
+														multi_nullvals,
+														/* All onerow cases has been handled above */
+														false));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid multi scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unique as well, we can't
+	 * guarantee others
+	 */
+	Var *var;
+
+	/* The position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the RelOptInfo in current level, sub_final_rel is get from the fetch_upper_rel
+ * we need to convert the UniqueKey from sub_final_rel to currel via the positions info in
+ * UniqueKey. Example:
+ *
+ * select t2.colx from t1, (select max(y), colx from t3 group by colx) t2 where ..
+ * The UniqueKey in sub_final_rel is Var(varno=1, varattrno=N), position=2.
+ * the UniqueKey in currel will be Var(varno=2, varattrno=2), position= 1
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+	/*
+	 * Calculate max_colno in subquery. In fact we can check this with
+	 * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+	 * is not set on UPPERREL_FINAL relation, so do it this way
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext) * (max_colno_subq + 1));
+
+	/*
+	 * Create an array for each expr in currel->reltarget->exprs, the array index
+	 * is the colno in subquery, so that we can get the expr quickly given a colno_subq
+	 */
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UniqueKey */
+		ctx->var = var;
+	}
+
+	/* Convert the UniqueKey from sub_final_rel to currel */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* The column is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   colnos,
+													   ukey->multi_nullvals,
+													   ukey->onerow));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's uniquekey mergeop outrerel's uniquekey exists
+ * in clause_list.
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	if (relation_is_onerow(innerrel))
+		return true;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * beginning
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	if (ukey->onerow)
+	{
+		/* Some helpful tiny check for UniqueKey */
+
+		/* 1. We will only store one UniqueKey for this rel */
+		Assert(list_length(rel->uniquekeys) == 1);
+		/* 2. multi_nullvals must be false */
+		Assert(!ukey->multi_nullvals);
+		/* 3. exprs & positions must be NIL */
+		Assert(ukey->exprs == NIL);
+		Assert(ukey->positions == NIL);
+	}
+	return ukey->onerow;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	pos;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	multi_nullvals = false;
+
+	/* Fast path. Check if the indexed columns are used in this relation
+	 * If not, return fast.
+	 */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		int attno = unique_index->indexkeys[pos] - rel->min_attr;
+		if (bms_is_empty(rel->attr_needed[attno]))
+			return;
+	}
+
+	/* We still need to check the rel->reltarget->exprs to get the exprs and positions */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (match_index_to_operand((Node *)lfirst(lc), pos, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, pos+1);
+				if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+								   rel->notnullattrs))
+					multi_nullvals = true;
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, multi_nullvals, false));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique. There is no need to create UniqueKey for every expr, we just set
+ * UniqueKey->onerow to true is OK
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, /* No need to set exprs */
+											   NIL, /* No need to set positions */
+											   false, /* onerow can't have multi_nullvals */
+											   true));
+
+}
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the uniquekeys which are not possible to use later
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrict info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known global_unique_indexes to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+	
+	/* XXX: If there are some vars which is not in current levelsup, the semantic is 
+	 * imprecise, should we avoid it? levelsup = 1 is just a demo, maybe we need to 
+	 * check every level other than 0, if so, we need write another pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+	
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											colnos,
+											false, /* sortgroupclause can't be multi_nullvals */
+											relation_is_onerow(rel) /* should be always false */
+								  ));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..f8491e2bcf 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -574,49 +576,11 @@ reduce_unique_semijoins(PlannerInfo *root)
  * rel_supports_distinctness
  *		Could the relation possibly be proven distinct on some set of columns?
  *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
  */
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +604,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +886,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..ca6303915c 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b31c52404a..6f1d9b6b2f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3901,6 +3903,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4618,7 +4622,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4915,7 +4919,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5176,6 +5180,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..8aa4d24cb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 25545029d7..0b2f9d398a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..fd050249d2 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions,
+								bool multi_nullvals, bool onerow);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1c83772d62..f7f29f8018 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -728,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1045,6 +1048,35 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used when we translate the UniqueKey
+ * in subquery.
+ * multi_nullvals: true means multi null values may exists in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ * onerow means the related relation return 1 row only. Like filter with unique
+ * index, aggregate without group node, join 2 1-row relations. An optimization
+ * is if the onerow is set to true, we will set not record every expr as a UniqueKey,
+ * we store exprs as a NIL.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		multi_nullvals;
+	bool		onerow;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..d12e5295c3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -241,4 +241,38 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+extern bool relation_is_onerow(RelOptInfo *rel);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..7b574efb0c 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1109,13 +1109,13 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
+   Group Key: t1.a, t1.c, t1.d, t2.x, t2.y
    ->  Hash Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
@@ -1125,13 +1125,13 @@ group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..54d987405a 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4415,36 +4415,35 @@ select d.* from d left join (select distinct * from b) s
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 9480abd577..3446c3e9fd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -394,13 +394,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..d52a6052de 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1516,13 +1516,13 @@ select d.* from d left join (select distinct * from b) s
 -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
 -- but this happens too late for join removal in the outer plan level.)
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.25.1

v5-0002-Skip-DISTINCT-GROUP-BY-if-input-is-already-unique.patchapplication/octet-stream; name=v5-0002-Skip-DISTINCT-GROUP-BY-if-input-is-already-unique.patchDownload

From 20282bc40bb59686a11eb8180b9c4d39d224f570 Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 14 Apr 2020 20:21:07 +1200
Subject: [PATCH v5 2/2] Skip DISTINCT / GROUP BY if input is already unique

If we can detect that the input to a DISTINCT is already unique on the
SELECT list, then we can completely skip doing any uniquification work.
Likewise with GROUP BY, however, we can only do this when there are no
aggregate functions.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  28 +-
 src/backend/optimizer/plan/planner.c          |  27 +-
 src/test/regress/expected/aggregates.out      |  65 ++--
 src/test/regress/expected/select_distinct.out | 335 ++++++++++++++++++
 src/test/regress/sql/select_distinct.sql      | 118 ++++++
 5 files changed, 515 insertions(+), 58 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e941b7e538..88441568b7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2904,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6f1d9b6b2f..f5b2794c14 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3815,6 +3815,23 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggreation functions
+		 */
+		if (groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL &&
+			relation_has_uniquekeys_for(root, input_rel, groupExprs))
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -4741,6 +4758,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4778,10 +4801,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7b574efb0c..42bd180895 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1112,16 +1105,14 @@ explain (costs off) select a,c from t1 group by a,c,d;
 explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.c, t1.d, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
@@ -1161,12 +1152,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1168,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..227bc27af4 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,338 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- Distinct is ok even with NOT clause-list both UNIQUE keys shown in targetlist
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because OF group BY
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+DROP TABLE dist_p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..72716217c0 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,121 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- Distinct is ok even with NOT clause-list both UNIQUE keys shown in targetlist
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because OF group BY
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
-- 
2.25.1

#16

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#15)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi David:

Thanks for your time.

1. Out of date comment in join.sql

-- join removal is not possible when the GROUP BY contains a column that is
-- not in the join condition. (Note: as of 9.6, we notice that b.id is a
-- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
-- but this happens too late for join removal in the outer plan level.)
explain (costs off)
select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
on d.a = s.d;

You've changed the GROUP BY clause so it does not include b.id, so the
Note in the comment is now misleading.

Thanks, I will fix this one in the following patch.

2. I think 0002 is overly restrictive in its demands that
parse->hasAggs must be false. We should be able to just use a Group
Aggregate with unsorted input when the input_rel is unique on the
GROUP BY clause. This will save on hashing and sorting. Basically
similar to what we do for when a query contains aggregates without any
GROUP BY.

Yes, This will be a perfect result, the difficult is the current
aggregation function
execution is highly coupled with Agg node(ExecInitAgg) which is removed in
the
unique case. I ever make the sum (w/o finalfn) and avg(with finalfn)
works in a hack way, but still many stuffs is not handled. Let me prepare
the code
for this purpose in 1~2 days to see if I'm going with the right direction.

Ashutosh also has an idea[1]/messages/by-id/CAExHW5sY+L6iZ=rwnL7n3jET7aNLCNQimvfcS7C+5wmdjmdPiw@mail.gmail.com that if the relation underlying an Agg node is
known to be unique for given groupByClause, we could safely use
AGG_SORTED strategy. Though the input is not ordered, it's sorted thus for
every row Agg
node will combine/finalize the aggregate result.

I will target the perfect result first and see how many effort do we need,
if not,
I will try Ashutosh's suggestion.

3. I don't quite understand why you changed this to a right join:
-- Test case where t1 can be optimized but not t2
explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
Perhaps this change is left over from some previous version of the patch?

This is on purpose. the original test case is used to test we can short
the group key for t1 but not t2 for aggregation, but if I keep the inner
join, the
aggnode will be removed totally, so I have to change it to right join in
order
to keep the aggnode. The full test case is:

-- Test case where t1 can be optimized but not t2

explain (costs off) select t1.*,t2.x,t2.z

from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y

group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;

where (a, b) is the primary key of t1.

[1]: /messages/by-id/CAExHW5sY+L6iZ=rwnL7n3jET7aNLCNQimvfcS7C+5wmdjmdPiw@mail.gmail.com
/messages/by-id/CAExHW5sY+L6iZ=rwnL7n3jET7aNLCNQimvfcS7C+5wmdjmdPiw@mail.gmail.com

#17

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#16)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, 15 Apr 2020 at 12:19, Andy Fan <zhihui.fan1213@gmail.com> wrote:

2. I think 0002 is overly restrictive in its demands that
parse->hasAggs must be false. We should be able to just use a Group
Aggregate with unsorted input when the input_rel is unique on the
GROUP BY clause. This will save on hashing and sorting. Basically
similar to what we do for when a query contains aggregates without any
GROUP BY.

Yes, This will be a perfect result, the difficult is the current aggregation function
execution is highly coupled with Agg node(ExecInitAgg) which is removed in the
unique case.

This case here would be slightly different. It would be handled by
still creating a Group Aggregate path, but just not consider Hash
Aggregate and not Sort the input to the Group Aggregate path. Perhaps
that's best done by creating a new flag bit and using it in
create_grouping_paths() in the location where we set the flags
variable. If you determine that the input_rel is unique for the GROUP
BY clause, and that there are aggregate functions, then set a flag,
e.g GROUPING_INPUT_UNIQUE. Likely there will be a few other flags that
you can skip setting in that function, for example, there's no need to
check if the input can sort, so no need for GROUPING_CAN_USE_SORT,
since you won't need to sort, likewise for GROUPING_CAN_USE_HASH. I'd
say there also is no need for checking if we can set
GROUPING_CAN_PARTIAL_AGG (What would be the point in doing partial
aggregation when there's 1 row per group?) Then down in
add_paths_to_grouping_rel(), just add a special case before doing any
other code, such as:

if ((extra->flags & GROUPING_INPUT_UNIQUE) != 0 && parse->groupClause != NIL)
{
Path *path = input_rel->cheapest_total_path;

add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
path,
grouped_rel->reltarget,
AGG_SORTED,
AGGSPLIT_SIMPLE,
parse->groupClause,
havingQual,
agg_costs,
dNumGroups));
return;
}

You may also want to consider the cheapest startup path there too so
that the LIMIT processing can do something smarter later in planning
(assuming cheapest_total_path != cheapest_startup_path (which you'd
need to check for)).

Perhaps it would be better to only set the GROUPING_INPUT_UNIQUE if
there is a groupClause, then just Assert(parse->groupClause != NIL)
inside that if.

David

#18

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#17)

2 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, Apr 15, 2020 at 11:00 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 15 Apr 2020 at 12:19, Andy Fan <zhihui.fan1213@gmail.com> wrote:

2. I think 0002 is overly restrictive in its demands that
parse->hasAggs must be false. We should be able to just use a Group
Aggregate with unsorted input when the input_rel is unique on the
GROUP BY clause. This will save on hashing and sorting. Basically
similar to what we do for when a query contains aggregates without any
GROUP BY.

Yes, This will be a perfect result, the difficult is the current

aggregation function

execution is highly coupled with Agg node(ExecInitAgg) which is removed

in the

unique case.

This case here would be slightly different. It would be handled by
still creating a Group Aggregate path, but just not consider Hash
Aggregate and not Sort the input to the Group Aggregate path. Perhaps
that's best done by creating a new flag bit and using it in
create_grouping_paths() in the location where we set the flags
variable. If you determine that the input_rel is unique for the GROUP
BY clause, and that there are aggregate functions, then set a flag,
e.g GROUPING_INPUT_UNIQUE. Likely there will be a few other flags that
you can skip setting in that function, for example, there's no need to
check if the input can sort, so no need for GROUPING_CAN_USE_SORT,
since you won't need to sort, likewise for GROUPING_CAN_USE_HASH. I'd
say there also is no need for checking if we can set
GROUPING_CAN_PARTIAL_AGG (What would be the point in doing partial
aggregation when there's 1 row per group?) Then down in
add_paths_to_grouping_rel(), just add a special case before doing any
other code, such as:

if ((extra->flags & GROUPING_INPUT_UNIQUE) != 0 && parse->groupClause !=
NIL)
{
Path *path = input_rel->cheapest_total_path;

add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
path,
grouped_rel->reltarget,
AGG_SORTED,
AGGSPLIT_SIMPLE,
parse->groupClause,
havingQual,
agg_costs,
dNumGroups));
return;
}

You may also want to consider the cheapest startup path there too so
that the LIMIT processing can do something smarter later in planning
(assuming cheapest_total_path != cheapest_startup_path (which you'd
need to check for)).

Perhaps it would be better to only set the GROUPING_INPUT_UNIQUE if
there is a groupClause, then just Assert(parse->groupClause != NIL)
inside that if.

Thank you for your detailed explanation. The attached v6 has included
this feature.
Here is the the data to show the improvement.

Test cases:
create table grp2 (a int primary key, b char(200), c int);
insert into grp2 select i, 'x', i from generate_series(1, 10000000)i;
analyze grp2;
explain analyze select a, sum(c) from grp2 group by a;

w/o this feature:

postgres=# explain analyze select a, sum(c) from grp2 group by a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.43..712718.44 rows=10000000 width=12) (actual
time=0.088..15491.027 rows=10000000 loops=1)
Group Key: a
-> Index Scan using grp2_pkey on grp2 (cost=0.43..562718.44
rows=10000000 width=8) (actual time=0.068..6503.459 rows=10000000 loops=1)
Planning Time: 0.916 ms
Execution Time: *16252.397* ms
(5 rows)

Since the order of my data in heap and index is exactly same, which makes
the index scan much faster. The following is to test the cost of the
*hash* aggregation,

postgres=# set enable_indexscan to off;
SET
postgres=# explain analyze select a, sum(c) from grp2 group by a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=765531.00..943656.00 rows=10000000 width=12) (actual
time=14424.379..30133.171 rows=10000000 loops=1)
Group Key: a
Planned Partitions: 128
Peak Memory Usage: 4153 kB
Disk Usage: 2265608 kB
HashAgg Batches: 640
-> Seq Scan on grp2 (cost=0.00..403031.00 rows=10000000 width=8)
(actual time=0.042..2808.281 rows=10000000 loops=1)
Planning Time: 0.159 ms
Execution Time: *31098.804* ms
(9 rows)

With this feature:
explain analyze select a, sum(c) from grp2 group by a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..553031.57 rows=10000023 width=12) (actual
time=0.044..13209.485 rows=10000000 loops=1)
Group Key: a
-> Seq Scan on grp2 (cost=0.00..403031.23 rows=10000023 width=8)
(actual time=0.023..4938.171 rows=10000000 loops=1)
Planning Time: 0.400 ms
Execution Time: *13749.121* ms
(5 rows)

During the implementation, I also added AGG_UNIQUE AggStrategy to
record this information in Agg Plan node, this is a simple way to do it and
should be semantic correct.

V6 also includes:
1. Fix the comment misleading you mentioned above.
2. Fixed a concern case for `relation_has_uniquekeys_for` function.

+       /* For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+        * so we can't rely on list_is_subset to handle this special cases
+        */
+       if (exprs == NIL)
+               return false;

Best Regards
Andy Fan

Attachments:

v6-0002-Skip-DISTINCT-GROUP-BY-if-input-is-already-unique.patchapplication/octet-stream; name=v6-0002-Skip-DISTINCT-GROUP-BY-if-input-is-already-unique.patchDownload

From 0f01a6a9c35888f1fc08e48545832bd22707b14b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 16 Apr 2020 09:58:05 +0800
Subject: [PATCH v6 2/2] Skip DISTINCT / GROUP BY if input is already unique

If we can detect that the input to a DISTINCT is already unique on the
SELECT list, then we can completely skip doing any uniquification work.
Likewise with GROUP BY, however, we can only do this when there are no
aggregate functions.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  28 +-
 src/backend/commands/explain.c                |   3 +
 src/backend/optimizer/plan/createplan.c       |   3 +-
 src/backend/optimizer/plan/planner.c          | 155 +++++---
 src/include/nodes/nodes.h                     |   3 +-
 src/include/nodes/pathnodes.h                 |   1 +
 src/include/nodes/plannodes.h                 |   1 +
 src/test/regress/expected/aggregates.out      | 107 ++++--
 src/test/regress/expected/select_distinct.out | 335 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql           |  17 +
 src/test/regress/sql/select_distinct.sql      | 118 ++++++
 11 files changed, 668 insertions(+), 103 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index e941b7e538..88441568b7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2904,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7ae6131676..466f810c60 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1929,6 +1929,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
+			if (es->format != EXPLAIN_FORMAT_TEXT ||
+				(es->verbose && ((Agg *) plan)->input_unique))
+				ExplainPropertyBool("Input Unique", true, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9941dfe65e..0049d22227 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6353,7 +6353,7 @@ make_agg(List *tlist, List *qual,
 	/* Reduce to long, but 'ware overflow! */
 	numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 
-	node->aggstrategy = aggstrategy;
+	node->aggstrategy = aggstrategy == AGG_UNIQUE ? AGG_SORTED : aggstrategy;
 	node->aggsplit = aggsplit;
 	node->numCols = numGroupCols;
 	node->grpColIdx = grpColIdx;
@@ -6364,6 +6364,7 @@ make_agg(List *tlist, List *qual,
 	node->aggParams = NULL;		/* SS_finalize_plan() will fill this */
 	node->groupingSets = groupingSets;
 	node->chain = chain;
+	node->input_unique = aggstrategy == AGG_UNIQUE;
 
 	plan->qual = qual;
 	plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6f1d9b6b2f..d2d07f54f1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3815,6 +3815,25 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+	bool	group_unique_input = false;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggregation functions
+		 */
+		group_unique_input = relation_has_uniquekeys_for(root, input_rel, groupExprs);
+		if (group_unique_input &&
+			groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL)
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -3834,51 +3853,60 @@ create_grouping_paths(PlannerInfo *root,
 		int			flags = 0;
 		GroupPathExtraData extra;
 
-		/*
-		 * Determine whether it's possible to perform sort-based
-		 * implementations of grouping.  (Note that if groupClause is empty,
-		 * grouping_is_sortable() is trivially true, and all the
-		 * pathkeys_contained_in() tests will succeed too, so that we'll
-		 * consider every surviving input path.)
-		 *
-		 * If we have grouping sets, we might be able to sort some but not all
-		 * of them; in this case, we need can_sort to be true as long as we
-		 * must consider any sorted-input plan.
-		 */
-		if ((gd && gd->rollups != NIL)
-			|| grouping_is_sortable(parse->groupClause))
-			flags |= GROUPING_CAN_USE_SORT;
+		if (group_unique_input)
+		{
+			Assert(parse->groupClause != NIL);
+			Assert(gd == NULL);
+			flags |= GROUPING_INPUT_UNIQUE;
+		}
+		else
+		{
+			/*
+			 * Determine whether it's possible to perform sort-based
+			 * implementations of grouping.  (Note that if groupClause is empty,
+			 * grouping_is_sortable() is trivially true, and all the
+			 * pathkeys_contained_in() tests will succeed too, so that we'll
+			 * consider every surviving input path.)
+			 *
+			 * If we have grouping sets, we might be able to sort some but not all
+			 * of them; in this case, we need can_sort to be true as long as we
+			 * must consider any sorted-input plan.
+			 */
+			if ((gd && gd->rollups != NIL)
+				|| grouping_is_sortable(parse->groupClause))
+				flags |= GROUPING_CAN_USE_SORT;
 
-		/*
-		 * Determine whether we should consider hash-based implementations of
-		 * grouping.
-		 *
-		 * Hashed aggregation only applies if we're grouping. If we have
-		 * grouping sets, some groups might be hashable but others not; in
-		 * this case we set can_hash true as long as there is nothing globally
-		 * preventing us from hashing (and we should therefore consider plans
-		 * with hashes).
-		 *
-		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
-		 * BY aggregates.  (Doing so would imply storing *all* the input
-		 * values in the hash table, and/or running many sorts in parallel,
-		 * either of which seems like a certain loser.)  We similarly don't
-		 * support ordered-set aggregates in hashed aggregation, but that case
-		 * is also included in the numOrderedAggs count.
-		 *
-		 * Note: grouping_is_hashable() is much more expensive to check than
-		 * the other gating conditions, so we want to do it last.
-		 */
-		if ((parse->groupClause != NIL &&
-			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
-			flags |= GROUPING_CAN_USE_HASH;
+			/*
+			 * Determine whether we should consider hash-based implementations of
+			 * grouping.
+			 *
+			 * Hashed aggregation only applies if we're grouping. If we have
+			 * grouping sets, some groups might be hashable but others not; in
+			 * this case we set can_hash true as long as there is nothing globally
+			 * preventing us from hashing (and we should therefore consider plans
+			 * with hashes).
+			 *
+			 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+			 * BY aggregates.  (Doing so would imply storing *all* the input
+			 * values in the hash table, and/or running many sorts in parallel,
+			 * either of which seems like a certain loser.)  We similarly don't
+			 * support ordered-set aggregates in hashed aggregation, but that case
+			 * is also included in the numOrderedAggs count.
+			 *
+			 * Note: grouping_is_hashable() is much more expensive to check than
+			 * the other gating conditions, so we want to do it last.
+			 */
+			if ((parse->groupClause != NIL &&
+				 agg_costs->numOrderedAggs == 0 &&
+				 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+				flags |= GROUPING_CAN_USE_HASH;
 
-		/*
-		 * Determine whether partial aggregation is possible.
-		 */
-		if (can_partial_agg(root, agg_costs))
-			flags |= GROUPING_CAN_PARTIAL_AGG;
+			/*
+			 * Determine whether partial aggregation is possible.
+			 */
+			if (can_partial_agg(root, agg_costs))
+				flags |= GROUPING_CAN_PARTIAL_AGG;
+		}
 
 		extra.flags = flags;
 		extra.target_parallel_safe = target_parallel_safe;
@@ -4741,6 +4769,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4778,10 +4812,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
@@ -6501,9 +6531,40 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		group_input_unique = (extra->flags & GROUPING_INPUT_UNIQUE) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
+	if (group_input_unique)
+	{
+		Path *path = input_rel->cheapest_total_path;
+		add_path(grouped_rel, (Path *) create_agg_path(root,
+													   grouped_rel,
+													   path,
+													   grouped_rel->reltarget,
+													   AGG_UNIQUE,
+													   AGGSPLIT_SIMPLE,
+													   parse->groupClause,
+													   havingQual,
+													   agg_costs,
+													   dNumGroups));
+
+		if (path != input_rel->cheapest_startup_path)
+		{
+			path = input_rel->cheapest_startup_path;
+			add_path(grouped_rel, (Path *) create_agg_path(root,
+														   grouped_rel,
+														   path,
+														   grouped_rel->reltarget,
+														   AGG_UNIQUE,
+														   AGGSPLIT_SIMPLE,
+														   parse->groupClause,
+														   havingQual,
+														   agg_costs,
+														   dNumGroups));
+		}
+		return;
+	}
 	if (can_sort)
 	{
 		/*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 41110ed888..010266ed4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -761,7 +761,8 @@ typedef enum AggStrategy
 	AGG_PLAIN,					/* simple agg across all input rows */
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
-	AGG_MIXED					/* grouped agg, hash and sort both used */
+	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_UNIQUE					/* grouped agg, the group clause is unique */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index f7f29f8018..82eff8dc0c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2479,6 +2479,7 @@ typedef struct JoinPathExtraData
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_INPUT_UNIQUE		0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 55f363f70c..e8300d9f37 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -828,6 +828,7 @@ typedef struct Agg
 	/* Note: planner provides numGroups & aggParams only in HASHED/MIXED case */
 	List	   *groupingSets;	/* grouping sets to use */
 	List	   *chain;			/* chained Agg/Sort nodes */
+	bool		input_unique;   /* The input is unique already */
 } Agg;
 
 /* ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7b574efb0c..65912142f8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1112,16 +1105,14 @@ explain (costs off) select a,c from t1 group by a,c,d;
 explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.c, t1.d, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
@@ -1161,12 +1152,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1168,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
@@ -2557,6 +2544,46 @@ set work_mem to default;
 ----+----+----
 (0 rows)
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+              QUERY PLAN               
+---------------------------------------
+ GroupAggregate
+   Output: pk, sum(b)
+   Group Key: agg_unique_1.pk
+   Input Unique: true
+   ->  Seq Scan on public.agg_unique_1
+         Output: pk, b
+(6 rows)
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ GroupAggregate
+   Output: agg_unique_2.unsortable_col, count(*)
+   Group Key: agg_unique_2.unsortable_col
+   Input Unique: true
+   ->  HashAggregate
+         Output: agg_unique_2.unsortable_col
+         Group Key: agg_unique_2.unsortable_col
+         ->  Seq Scan on public.agg_unique_2
+               Output: agg_unique_2.a, agg_unique_2.unsortable_col
+(9 rows)
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+ unsortable_col | count 
+----------------+-------
+              2 |     1
+              1 |     1
+(2 rows)
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -2565,3 +2592,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..227bc27af4 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,338 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- Distinct is ok even with NOT clause-list both UNIQUE keys shown in targetlist
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because OF group BY
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+DROP TABLE dist_p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 3446c3e9fd..3dc40b13d5 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1159,6 +1159,21 @@ set work_mem to default;
   union all
 (select * from agg_group_4 except select * from agg_hash_4);
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -1167,3 +1182,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..72716217c0 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,121 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- Distinct is ok even with NOT clause-list both UNIQUE keys shown in targetlist
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because OF group BY
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
-- 
2.21.0

v6-0001-Introduce-UniqueKeys-to-determine-RelOptInfo-uniq.patchapplication/octet-stream; name=v6-0001-Introduce-UniqueKeys-to-determine-RelOptInfo-uniq.patchDownload

From 86105f410d25cbecf5bc02f3da6965150d15676f Mon Sep 17 00:00:00 2001
From: "dgrowley@gmail.com" <dgrowley@gmail.com>
Date: Tue, 14 Apr 2020 19:44:15 +1200
Subject: [PATCH v6 1/2] Introduce UniqueKeys to determine RelOptInfo unique
 properties

---
 .../postgres_fdw/expected/postgres_fdw.out    |    4 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |    1 +
 src/backend/nodes/list.c                      |   31 +
 src/backend/nodes/makefuncs.c                 |   16 +
 src/backend/optimizer/path/Makefile           |    3 +-
 src/backend/optimizer/path/allpaths.c         |   25 +-
 src/backend/optimizer/path/joinrels.c         |    2 +
 src/backend/optimizer/path/uniquekeys.c       | 1116 +++++++++++++++++
 src/backend/optimizer/plan/analyzejoins.c     |  143 +--
 src/backend/optimizer/plan/initsplan.c        |   10 +
 src/backend/optimizer/plan/planmain.c         |   13 -
 src/backend/optimizer/plan/planner.c          |   10 +-
 src/backend/optimizer/prep/prepunion.c        |    2 +
 src/backend/optimizer/util/plancat.c          |   10 +
 src/include/nodes/makefuncs.h                 |    3 +
 src/include/nodes/nodes.h                     |    1 +
 src/include/nodes/pathnodes.h                 |   32 +
 src/include/nodes/pg_list.h                   |    2 +
 src/include/optimizer/paths.h                 |   34 +
 src/test/regress/expected/aggregates.out      |   10 +-
 src/test/regress/expected/join.out            |   59 +-
 src/test/regress/sql/aggregates.sql           |    6 +-
 src/test/regress/sql/join.sql                 |   16 +-
 23 files changed, 1368 insertions(+), 181 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 62c2697920..e941b7e538 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..aa47eaed36 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return true iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index b442b5a29e..fe2d416293 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -812,3 +812,19 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, List* positions, bool multi_nullvals, bool onerow)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	Assert(list_length(exprs) == list_length(positions));
+	ukey->exprs = exprs;
+	ukey->positions = positions;
+	ukey->multi_nullvals = multi_nullvals;
+	ukey->onerow = onerow;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 255f56b827..166e3bd0ba 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
@@ -786,6 +798,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1291,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2366,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4e1650994d..c0d2332caf 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..1116ab2cb0
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1116 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/optimizer.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys,
+ * Set added_to_joinrel to true if a uniquekey has been added to joinrel.
+ * For a joinrel, if both sides have UniqueKey, then the combine of them
+ * must be unique for the joinrel as well, But we don't need to add it if
+ * either of them has been added to joinrel already. We use this struct to
+ * maintain such info.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	/* Set to true if the unique key has been added to joinrel->uniquekeys */
+	bool	added_to_joinrel;
+	/* If this uniquekey is still useful after join */
+	bool	useful;
+} *UniqueKeyContext;
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *joinrel,
+												  RelOptInfo *inputrel);
+
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+/* Helper function for groupres/distinctrel */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist = gather_mergeable_baserestrictlist(baserel);
+	bool	return_one_row = false;
+	List	*matched_uk_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+		/*
+		 * Set UniqueKey on member rel is useless, we have to recompute it at
+		 * upper level, see populate_partitionedrel_uniquekeys for reference
+		 */
+		return;
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+
+		if (ind->indexprs != NIL)
+			/* We can't guarantee if an expression returns a NULL value, so ignore it */
+			continue;
+		matched_uk_indexes = lappend(matched_uk_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uk_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can be used for UniqueKey based on:
+ * 1). It must include partition keys
+ * 2). All the childrels must has the same indexes.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_unique_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *global_ind;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		global_ind = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unique index doesn't contain partkey, then it is unique
+		 * on this partition only, so it is useless for us.
+		 */
+		if (!index_constains_partkey(rel, global_ind))
+			continue;
+		global_unique_indexlist = lappend(global_unique_indexlist,  global_ind);
+	}
+
+	/* Fast path */
+	if (global_unique_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_unique_indexlist);
+	}
+
+	/* Now we have the unique index list which as exactly same on all childrels,
+	 * Set the UniqueIndex just like it is non-partition table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_unique_indexlist);
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+	if (parse->hasTargetSRFs)
+		return;
+
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set */
+	if (parse->groupClause)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* it has aggregation but without a group by, so must be one line return */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquiekeys.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquiekeys
+ */
+void
+populate_unionrel_uniquiekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+	List	*colnos = NIL;
+	int i = 1;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc,  unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+		colnos = lappend_int(colnos, i);
+		i++;
+	}
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs, colnos, true, false));
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if it's
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed from false to true
+ * for some outer join cases.
+
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well. multi_nullvals is set to true if either
+ * side have multi_nullvals equals true.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	outer_is_onerow, inner_is_onerow;
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+    /* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	outer_is_onerow = relation_is_onerow(outerrel);
+	inner_is_onerow = relation_is_onerow(innerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel, outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel, innerrel);
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				/* The UniqueKey on baserel is not useful on the joinrel */
+				ctx->useful = false;
+				continue;
+			}
+			if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) && !ctx->uniquekey->multi_nullvals)
+			{
+				/* Change the multi_nullvals to true at this case */
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															true,
+															false));
+			}
+			else if (inner_is_onerow)
+			{
+				/* Since rows in innerrel can't be duplicated AND if innerrel is onerow,
+				 * the join result will be onerow also as well. Note: onerow implies
+				 * multi_nullvals = false.
+				 */
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true))
+	{
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+			/* NULL values in outer rel can be duplicated under JOIN_FULL only */
+			if (jointype == JOIN_FULL && ctx->uniquekey->multi_nullvals)
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+											  makeUniqueKey(ctx->uniquekey->exprs,
+															ctx->uniquekey->positions,
+															true,
+															false));
+
+			}
+			else if (outer_is_onerow)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else
+			{
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, ctx->uniquekey);
+				ctx->added_to_joinrel = true;
+			}
+		}
+	}
+
+	/* The combination of the UniqueKey from both sides is unique as well regardless of
+	 * join type, But no bother to add it if its subset has been added already.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			List	*exprs = NIL, *colnos = NIL;
+			bool multi_nullvals;
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			exprs = list_copy(ctx1->uniquekey->exprs);
+			colnos = list_copy(ctx1->uniquekey->positions);
+			exprs = list_concat(exprs, ctx2->uniquekey->exprs);
+			colnos = list_concat(colnos, ctx2->uniquekey->positions);
+
+			multi_nullvals = ctx1->uniquekey->multi_nullvals || ctx2->uniquekey->multi_nullvals;
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														colnos,
+														multi_nullvals,
+														/* All onerow cases has been handled above */
+														false));
+		}
+	}
+}
+
+
+/*
+ * Used to avoid multi scan of rel->reltarget->exprs, See populate_subquery_uniquekeys
+ */
+typedef struct SubqueryUniqueKeyData
+{
+	/*
+	 * Only the Var reference to subquery's unique is unique as well, we can't
+	 * guarantee others
+	 */
+	Var *var;
+
+	/* The position of the var in the rel->reltarget */
+	int pos;
+} *SubqueryUniqueKeyContext;
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * currel is the RelOptInfo in current level, sub_final_rel is get from the fetch_upper_rel
+ * we need to convert the UniqueKey from sub_final_rel to currel via the positions info in
+ * UniqueKey. Example:
+ *
+ * select t2.colx from t1, (select max(y), colx from t3 group by colx) t2 where ..
+ * The UniqueKey in sub_final_rel is Var(varno=1, varattrno=N), position=2.
+ * the UniqueKey in currel will be Var(varno=2, varattrno=2), position= 1
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	SubqueryUniqueKeyContext *ctx_array;
+	SubqueryUniqueKeyContext ctx;
+	Index max_colno_subq = 0;
+	ListCell	*lc, *lc2;
+	int pos = 0;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		/* This should be a common case */
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+	/*
+	 * Calculate max_colno in subquery. In fact we can check this with
+	 * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+	 * is not set on UPPERREL_FINAL relation, so do it this way
+	 */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		foreach(lc2, ukey->positions)
+		{
+			Index colno = lfirst_int(lc2);
+			if (max_colno_subq < colno)
+				max_colno_subq = colno;
+		}
+	}
+
+	Assert(max_colno_subq > 0);
+	ctx_array = palloc0(sizeof(SubqueryUniqueKeyContext) * (max_colno_subq + 1));
+
+	/*
+	 * Create an array for each expr in currel->reltarget->exprs, the array index
+	 * is the colno in subquery, so that we can get the expr quickly given a colno_subq
+	 */
+	foreach(lc, currel->reltarget->exprs)
+	{
+		Var *var;
+		int colno_subq;
+		pos++;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+
+		var = lfirst_node(Var, lc);
+		colno_subq = var->varattno;
+		if (colno_subq > max_colno_subq)
+			continue;
+		ctx_array[colno_subq] = palloc0(sizeof(struct SubqueryUniqueKeyData));
+		ctx = ctx_array[colno_subq]; /* corresponding to subquery's uniquekey->positions[x] */
+		ctx->pos = pos; /* the position in current targetlist,  will be used to set UniqueKey */
+		ctx->var = var;
+	}
+
+	/* Convert the UniqueKey from sub_final_rel to currel */
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey * ukey = lfirst_node(UniqueKey, lc);
+		bool uniquekey_useful = true;
+		List	*exprs = NIL;
+		List	*colnos = NIL;
+		foreach(lc2, ukey->positions)
+		{
+			Index sub_colno = lfirst_int(lc2);
+			ctx = ctx_array[sub_colno];
+			if (ctx == NULL)
+			{
+				/* The column is not used outside */
+				uniquekey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, ctx->var);
+			colnos = lappend_int(colnos, ctx->pos);
+		}
+		if (uniquekey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   colnos,
+													   ukey->multi_nullvals,
+													   ukey->onerow));
+	}
+}
+
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's uniquekey mergeop outrerel's uniquekey exists
+ * in clause_list.
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	if (relation_is_onerow(innerrel))
+		return true;
+
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			/* If the clauselist match any uk from outerrel, the innerrel will be unique
+			 * based on the fact that innerrel->uniquekeys != NIL which is checked at the
+			 * beginning
+			 */
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	if (ukey->onerow)
+	{
+		/* Some helpful tiny check for UniqueKey */
+
+		/* 1. We will only store one UniqueKey for this rel */
+		Assert(list_length(rel->uniquekeys) == 1);
+		/* 2. multi_nullvals must be false */
+		Assert(!ukey->multi_nullvals);
+		/* 3. exprs & positions must be NIL */
+		Assert(ukey->exprs == NIL);
+		Assert(ukey->positions == NIL);
+	}
+	return ukey->onerow;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel, List *exprs)
+{
+	ListCell *lc;
+
+	/* For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+	 * so we can't rely on list_is_subset to handle this special cases
+	 */
+	if (exprs == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ * 	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	pos;
+	List	*exprs = NIL;
+	List	*positions = NIL;
+	bool	multi_nullvals = false;
+
+	/* Fast path. Check if the indexed columns are used in this relation
+	 * If not, return fast.
+	 */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		int attno = unique_index->indexkeys[pos] - rel->min_attr;
+		if (bms_is_empty(rel->attr_needed[attno]))
+			return;
+	}
+
+	/* We still need to check the rel->reltarget->exprs to get the exprs and positions */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		ListCell	*lc;
+		bool	find_in_exprs = false;
+
+		foreach(lc, rel->reltarget->exprs)
+		{
+			Var *var;
+			if (!IsA(lfirst(lc), Var))
+				continue;
+			var = lfirst_node(Var, lc);
+			if (match_index_to_operand((Node *)lfirst(lc), pos, unique_index))
+			{
+				find_in_exprs = true;
+				exprs = lappend(exprs, lfirst(lc));
+				positions = lappend_int(positions, pos+1);
+				if (!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+								   rel->notnullattrs))
+					multi_nullvals = true;
+				break;
+			}
+		}
+		if (!find_in_exprs)
+			return;
+	}
+
+	if (exprs != NIL)
+	{
+		rel->uniquekeys = lappend(rel->uniquekeys,
+								  makeUniqueKey(exprs, positions, multi_nullvals, false));
+	}
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure about the relation only returns one row, then all the columns
+ * are unique. There is no need to create UniqueKey for every expr, we just set
+ * UniqueKey->onerow to true is OK
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, /* No need to set exprs */
+											   NIL, /* No need to set positions */
+											   false, /* onerow can't have multi_nullvals */
+											   true));
+
+}
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel, we also filter out
+ * all the uniquekeys which are not possible to use later
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *joinrel,  RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see simple_copy_indexinfo_to_parent)
+ */
+
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+/*
+ * Copy these macros from copyfuncs.c since I don't want make
+ * simple_copy_indexinfo_to_parent public since it is a so customized copy.
+ */
+
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent
+ * Copy the IndexInfo from child index info to parent, which will be used to
+ * 1. Test if the same index exists in all the childrels.
+ * 2. if the parentrel->reltarget/basicrestrict info matches this index.
+ * The copied and modified index is just used in this scope.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * We have to change this to let the later index match (like pk = 1)
+	 * rel->reltarget work
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * Check the current known global_unique_indexes to see if every index here
+ * all exists in the given childrel, if not, it will be removed from
+ * the list
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	ListCell *lc;
+	List	*exprs = NIL,  *colnos = NIL;
+
+	/* XXX: If there are some vars which is not in current levelsup, the semantic is
+	 * imprecise, should we avoid it? levelsup = 1 is just a demo, maybe we need to
+	 * check every level other than 0, if so, we need write another pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+
+	foreach(lc, sortgroups)
+	{
+		Index sortref = lfirst_node(SortGroupClause, lc)->tleSortGroupRef;
+		int c = 1;
+		foreach(lc, parse->targetList)
+		{
+			TargetEntry *tle = lfirst_node(TargetEntry, lc);
+			if (tle->ressortgroupref == sortref)
+			{
+				exprs = lappend(exprs, tle->expr);
+				colnos = lappend_int(colnos, c);
+			}
+			++c;
+		}
+	}
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											colnos,
+											false, /* sortgroupclause can't be multi_nullvals */
+											relation_is_onerow(rel) /* should be always false */
+								  ));
+}
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..f8491e2bcf 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -574,49 +576,11 @@ reduce_unique_semijoins(PlannerInfo *root)
  * rel_supports_distinctness
  *		Could the relation possibly be proven distinct on some set of columns?
  *
- * This is effectively a pre-checking function for rel_is_distinct_for().
- * It must return true if rel_is_distinct_for() could possibly return true
- * with this rel, but it should not expend a lot of cycles.  The idea is
- * that callers can avoid doing possibly-expensive processing to compute
- * rel_is_distinct_for()'s argument lists if the call could not possibly
- * succeed.
  */
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +604,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +886,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..ca6303915c 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b31c52404a..6f1d9b6b2f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3901,6 +3903,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4618,7 +4622,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4915,7 +4919,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5176,6 +5180,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..8aa4d24cb0 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquiekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 25545029d7..0b2f9d398a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..fd050249d2 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, List *positions,
+								bool multi_nullvals, bool onerow);
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1c83772d62..f7f29f8018 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
@@ -728,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1045,6 +1048,35 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo.
+ * positions is a list of position where the corresponding exprs's location in
+ * current reloptinfo->reltarget. It will be used when we translate the UniqueKey
+ * in subquery.
+ * multi_nullvals: true means multi null values may exists in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ * onerow means the related relation return 1 row only. Like filter with unique
+ * index, aggregate without group node, join 2 1-row relations. An optimization
+ * is if the onerow is set to true, we will set not record every expr as a UniqueKey,
+ * we store exprs as a NIL.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	List	   *positions;
+	bool		multi_nullvals;
+	bool		onerow;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..d12e5295c3 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -241,4 +241,38 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquiekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs);
+extern bool relation_is_onerow(RelOptInfo *rel);
+
 #endif							/* PATHS_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..7b574efb0c 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1109,13 +1109,13 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
+   Group Key: t1.a, t1.c, t1.d, t2.x, t2.y
    ->  Hash Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
@@ -1125,13 +1125,13 @@ group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..1cb23d6fee 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4411,40 +4411,37 @@ select d.* from d left join (select distinct * from b) s
 (1 row)
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 9480abd577..3446c3e9fd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -394,13 +394,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5fc6617369..81de684212 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1512,17 +1512,15 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id and d.b = s.c_id;
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.21.0

#19

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 5 years ago

In reply to: Andy Fan (#18)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, Apr 16, 2020 at 7:47 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

(9 rows)

With this feature:
explain analyze select a, sum(c) from grp2 group by a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..553031.57 rows=10000023 width=12) (actual time=0.044..13209.485 rows=10000000 loops=1)
Group Key: a
-> Seq Scan on grp2 (cost=0.00..403031.23 rows=10000023 width=8) (actual time=0.023..4938.171 rows=10000000 loops=1)
Planning Time: 0.400 ms
Execution Time: 13749.121 ms
(5 rows)

Applying the patch gives a white space warning
git am /tmp/v6-000*
Applying: Introduce UniqueKeys to determine RelOptInfo unique properties
.git/rebase-apply/patch:545: indent with spaces.
/* Fast path */
warning: 1 line adds whitespace errors.
Applying: Skip DISTINCT / GROUP BY if input is already unique

Compiling the patch causes one warning
nodeAgg.c:2134:3: warning: enumeration value ‘AGG_UNIQUE’ not handled
in switch [-Wswitch]

I have not looked at the patch. The numbers above look good. The time
spent in summing up a column in each row (we are summing only one
number per group) is twice the time it took to read those rows from
the table. That looks odd. But it may not be something unrelated to
your patch. I also observed that for explain analyze select a from
grp2 group by a; we just produce a plan containing seq scan node,
which is a good thing.

--
Best Wishes,
Ashutosh Bapat

#20

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Ashutosh Bapat (#19)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, Apr 16, 2020 at 8:36 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

On Thu, Apr 16, 2020 at 7:47 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

(9 rows)

With this feature:
explain analyze select a, sum(c) from grp2 group by a;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

GroupAggregate (cost=0.00..553031.57 rows=10000023 width=12) (actual

time=0.044..13209.485 rows=10000000 loops=1)

Group Key: a
-> Seq Scan on grp2 (cost=0.00..403031.23 rows=10000023 width=8)

(actual time=0.023..4938.171 rows=10000000 loops=1)

Planning Time: 0.400 ms
Execution Time: 13749.121 ms
(5 rows)

Applying the patch gives a white space warning
git am /tmp/v6-000*
Applying: Introduce UniqueKeys to determine RelOptInfo unique properties
.git/rebase-apply/patch:545: indent with spaces.
/* Fast path */
warning: 1 line adds whitespace errors.
Applying: Skip DISTINCT / GROUP BY if input is already unique

Compiling the patch causes one warning
nodeAgg.c:2134:3: warning: enumeration value ‘AGG_UNIQUE’ not handled
in switch [-Wswitch]

Thanks, I will fix them together with some detailed review suggestion.
(I know the review need lots of time, so appreciated for it).

I have not looked at the patch. The numbers above look good. The time
spent in summing up a column in each row (we are summing only one
number per group) is twice the time it took to read those rows from
the table. That looks odd. But it may not be something unrelated to
your patch. I also observed that for explain analyze select a from
grp2 group by a; we just produce a plan containing seq scan node,
which is a good thing.

Great and welcome back Ashutosh:)

Show quoted text

--
Best Wishes,
Ashutosh Bapat

#21

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#18)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, 16 Apr 2020 at 14:17, Andy Fan <zhihui.fan1213@gmail.com> wrote:

V6 also includes:
1. Fix the comment misleading you mentioned above.
2. Fixed a concern case for `relation_has_uniquekeys_for` function.

Over on [1]/messages/by-id/CAMbWs4-THacv3DdMpiTrvg5ZY7sNViFF1pTU=kOKmtPBrE9-0Q@mail.gmail.com, Richard highlights a problem in the current join removals
lack of ability to remove left joins unless the min_righthand side of
the join is a singleton rel. It's my understanding that the reason the
code checks for this is down to the fact that join removals used
unique indexed to prove the uniqueness of the relation and obviously,
those can only exist on base relations. I wondered if you might want
to look into a 0003 patch which removes that restriction? I think this
can be done now since we no longer look at unique indexes to provide
the proves that the join to be removed won't duplicate outer side
rows.

David

[1]: /messages/by-id/CAMbWs4-THacv3DdMpiTrvg5ZY7sNViFF1pTU=kOKmtPBrE9-0Q@mail.gmail.com

#22

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#21)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, Apr 29, 2020 at 8:29 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 16 Apr 2020 at 14:17, Andy Fan <zhihui.fan1213@gmail.com> wrote:

V6 also includes:
1. Fix the comment misleading you mentioned above.
2. Fixed a concern case for `relation_has_uniquekeys_for` function.

Over on [1], Richard highlights a problem in the current join removals
lack of ability to remove left joins unless the min_righthand side of
the join is a singleton rel. It's my understanding that the reason the
code checks for this is down to the fact that join removals used
unique indexed to prove the uniqueness of the relation and obviously,
those can only exist on base relations. I wondered if you might want
to look into a 0003 patch which removes that restriction? I think this
can be done now since we no longer look at unique indexes to provide
the proves that the join to be removed won't duplicate outer side
rows.

Yes, I think that would be another benefit of UniqueKey, but it doesn't
happen
until now. I will take a look of it today and fix it in a separated
commit.

Best Regards
Andy Fan

#23

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#22)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, Apr 29, 2020 at 8:34 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Apr 29, 2020 at 8:29 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 16 Apr 2020 at 14:17, Andy Fan <zhihui.fan1213@gmail.com> wrote:

V6 also includes:
1. Fix the comment misleading you mentioned above.
2. Fixed a concern case for `relation_has_uniquekeys_for` function.

Over on [1], Richard highlights a problem in the current join removals
lack of ability to remove left joins unless the min_righthand side of
the join is a singleton rel. It's my understanding that the reason the
code checks for this is down to the fact that join removals used
unique indexed to prove the uniqueness of the relation and obviously,
those can only exist on base relations. I wondered if you might want
to look into a 0003 patch which removes that restriction? I think this
can be done now since we no longer look at unique indexes to provide
the proves that the join to be removed won't duplicate outer side
rows.

Yes, I think that would be another benefit of UniqueKey, but it doesn't
happen
until now. I will take a look of it today and fix it in a separated
commit.

I have make it work locally, the basic idea is to postpone the join
removal at
build_join_rel stage where the uniquekey info is well maintained. I will
test
more to send a product-ready-target patch tomorrow.

# explain (costs off) select a.i from a left join b on a.i = b.i and
b.j in (select j from c);
QUERY PLAN
---------------
Seq Scan on a
(1 row)

Best Regard
Andy Fan

#24

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#23)

6 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

I just uploaded the v7 version and split it into smaller commits for easier
review/merge. I also maintain a up-to-date README.uniquekey
document since something may changed during discussion or later code.

Here is the simple introduction of each commit.

====
1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch

This commit adds the notnullattrs to RelOptInfo, which grabs the
information
from both catalog and user's query.

2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch

This commit just add the uniquekeys to RelOptInfo and maintain it at every
stage. However the upper level code is not changed due to this.

Some changes of this part in v7:
1). Removed the UniqueKey.positions attribute. In the past it is used in
convert_subquery_uniquekeys, however we don't need it actually (And I
maintained it wrong in the past). Now I build the relationship between
the
outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
2). onerow UniqueKey(exprs = NIL) need to be converted to normal
uniquekey(exprs
!= NIL) if it is not one-row any more. This may happen on some outer
join.

3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch

Refactor the existing functions like innerrel_is_unique/res_is_distinct_for
to
use UniqueKey, and postpone the call of remove_useless_join and
reduce_unique_semijoins to use the new implementation.

4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch

Remove the distinct node if the result is distinct already. Remove the
aggnode
if the group by clause is unique already AND there is no aggregation
function in
query.

5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch

If the group by clause is unique and query has aggregation function, we use
the AGG_SORT strategy but without really sort since it has only one row in
each
group.

6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch

This commit run join removal at build_join_rel. At that time, it can fully
uses
unique key. It can handle some more cases, I added some new test cases to
join.sql. However it can be a replacement of the current one. There are some
cases the new strategy can work run well but the current one can. Like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id =
b.id);

during the join a & b, the join can't be removed since b.id is still useful
in
future. However in the future, we know the b.id can be removed as well, but
it is too late to remove the previous join.

At the implementation part, the main idea is if the join_canbe_removed. we
will copy the pathlist from outerrel to joinrel. There are several items
need to
handle.

1. To make sure the overall join_search_one_level, we have to keep the
joinrel
even the innerrel is removed (rather than discard the joinrel).
2. If the innerrel can be removed, we don't need to build pathlist for
joinrel,
we just reuse the pathlist from outerrel. However there are many places
where
use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
have to change the parent to joinrel.
3. During create plan for some path on RTE_RELATION, it needs to know the
relation Oid with path->parent->relid. so we have to use the
outerrel->relid
to overwrite the joinrel->relid which is 0 before.
4. Almost same paths as item 3, it usually assert
best_path->parent->rtekind ==
RTE_RELATION; now the path may appeared in joinrel, so I used
outerrel->rtekind to overwrite joinrel->rtekind.
5. I guess there are some dependencies between path->pathtarget and
rel->reltarget. since we reuse the pathlist of outerrel, so I used the
outer->reltarget as well. If the join can be removed, I guess the length
of
list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we
can
rely on the ProjectionPath to reduce the tlist.

My patches is based on the current latest commit fb544735f1.

Best Regards
Andy Fan

Show quoted text

Attachments:

v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patchapplication/octet-stream; name=v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patchDownload

From b4945b54ea16a1b75c4314cb5406537edaf31176 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 16:32:28 +0800
Subject: [PATCH v7 5/6] If the group by clause is unique and we have
 aggregation function,

We treat the input as sorted without an explicitly sort since each
group has one 1 row.
---
 src/backend/commands/explain.c           |   4 +
 src/backend/executor/nodeAgg.c           |   6 ++
 src/backend/optimizer/plan/createplan.c  |   3 +-
 src/backend/optimizer/plan/planner.c     | 127 +++++++++++++++--------
 src/include/nodes/nodes.h                |   3 +-
 src/include/nodes/pathnodes.h            |   1 +
 src/include/nodes/plannodes.h            |   1 +
 src/test/regress/expected/aggregates.out |  42 ++++++++
 src/test/regress/sql/aggregates.sql      |  17 +++
 9 files changed, 159 insertions(+), 45 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1275bec673..cdd3c0ec16 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1929,6 +1929,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
+			if (es->format != EXPLAIN_FORMAT_TEXT ||
+				(es->verbose && ((Agg *) plan)->input_unique))
+				ExplainPropertyBool("Input Unique",
+									((Agg *) plan)->input_unique, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 9f4229de60..2d5493c744 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2145,6 +2145,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_UNIQUE:
+				/* AGG_UNIQUE is translated to AGG_SORTED, Handle it here
+				 * to make compiler quiet.
+				 */
+				Assert(false);
+				break;
 		}
 
 		if (!TupIsNull(result))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9941dfe65e..0049d22227 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6353,7 +6353,7 @@ make_agg(List *tlist, List *qual,
 	/* Reduce to long, but 'ware overflow! */
 	numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 
-	node->aggstrategy = aggstrategy;
+	node->aggstrategy = aggstrategy == AGG_UNIQUE ? AGG_SORTED : aggstrategy;
 	node->aggsplit = aggsplit;
 	node->numCols = numGroupCols;
 	node->grpColIdx = grpColIdx;
@@ -6364,6 +6364,7 @@ make_agg(List *tlist, List *qual,
 	node->aggParams = NULL;		/* SS_finalize_plan() will fill this */
 	node->groupingSets = groupingSets;
 	node->chain = chain;
+	node->input_unique = aggstrategy == AGG_UNIQUE;
 
 	plan->qual = qual;
 	plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52fa7c5a45..8ac82e727b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3852,51 +3852,61 @@ create_grouping_paths(PlannerInfo *root,
 		int			flags = 0;
 		GroupPathExtraData extra;
 
-		/*
-		 * Determine whether it's possible to perform sort-based
-		 * implementations of grouping.  (Note that if groupClause is empty,
-		 * grouping_is_sortable() is trivially true, and all the
-		 * pathkeys_contained_in() tests will succeed too, so that we'll
-		 * consider every surviving input path.)
-		 *
-		 * If we have grouping sets, we might be able to sort some but not all
-		 * of them; in this case, we need can_sort to be true as long as we
-		 * must consider any sorted-input plan.
-		 */
-		if ((gd && gd->rollups != NIL)
-			|| grouping_is_sortable(parse->groupClause))
-			flags |= GROUPING_CAN_USE_SORT;
+		if (group_unique_input)
+		{
+			/* In this case we don't need to set other flags */
+			Assert(parse->groupClause != NIL);
+			Assert(gd == NULL);
+			flags |= GROUPING_INPUT_UNIQUE;
+		}
+		else
+		{
+			/*
+			 * Determine whether it's possible to perform sort-based
+			 * implementations of grouping.  (Note that if groupClause is empty,
+			 * grouping_is_sortable() is trivially true, and all the
+			 * pathkeys_contained_in() tests will succeed too, so that we'll
+			 * consider every surviving input path.)
+			 *
+			 * If we have grouping sets, we might be able to sort some but not all
+			 * of them; in this case, we need can_sort to be true as long as we
+			 * must consider any sorted-input plan.
+			 */
+			if ((gd && gd->rollups != NIL)
+				|| grouping_is_sortable(parse->groupClause))
+				flags |= GROUPING_CAN_USE_SORT;
 
-		/*
-		 * Determine whether we should consider hash-based implementations of
-		 * grouping.
-		 *
-		 * Hashed aggregation only applies if we're grouping. If we have
-		 * grouping sets, some groups might be hashable but others not; in
-		 * this case we set can_hash true as long as there is nothing globally
-		 * preventing us from hashing (and we should therefore consider plans
-		 * with hashes).
-		 *
-		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
-		 * BY aggregates.  (Doing so would imply storing *all* the input
-		 * values in the hash table, and/or running many sorts in parallel,
-		 * either of which seems like a certain loser.)  We similarly don't
-		 * support ordered-set aggregates in hashed aggregation, but that case
-		 * is also included in the numOrderedAggs count.
-		 *
-		 * Note: grouping_is_hashable() is much more expensive to check than
-		 * the other gating conditions, so we want to do it last.
-		 */
-		if ((parse->groupClause != NIL &&
-			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
-			flags |= GROUPING_CAN_USE_HASH;
+			/*
+			 * Determine whether we should consider hash-based implementations of
+			 * grouping.
+			 *
+			 * Hashed aggregation only applies if we're grouping. If we have
+			 * grouping sets, some groups might be hashable but others not; in
+			 * this case we set can_hash true as long as there is nothing globally
+			 * preventing us from hashing (and we should therefore consider plans
+			 * with hashes).
+			 *
+			 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+			 * BY aggregates.  (Doing so would imply storing *all* the input
+			 * values in the hash table, and/or running many sorts in parallel,
+			 * either of which seems like a certain loser.)  We similarly don't
+			 * support ordered-set aggregates in hashed aggregation, but that case
+			 * is also included in the numOrderedAggs count.
+			 *
+			 * Note: grouping_is_hashable() is much more expensive to check than
+			 * the other gating conditions, so we want to do it last.
+			 */
+			if ((parse->groupClause != NIL &&
+				 agg_costs->numOrderedAggs == 0 &&
+				 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+				flags |= GROUPING_CAN_USE_HASH;
 
-		/*
-		 * Determine whether partial aggregation is possible.
-		 */
-		if (can_partial_agg(root, agg_costs))
-			flags |= GROUPING_CAN_PARTIAL_AGG;
+			/*
+			 * Determine whether partial aggregation is possible.
+			 */
+			if (can_partial_agg(root, agg_costs))
+				flags |= GROUPING_CAN_PARTIAL_AGG;
+		}
 
 		extra.flags = flags;
 		extra.target_parallel_safe = target_parallel_safe;
@@ -6521,9 +6531,40 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		group_input_unique = (extra->flags & GROUPING_INPUT_UNIQUE) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
+	if (group_input_unique)
+	{
+		Path *path = input_rel->cheapest_total_path;
+		add_path(grouped_rel, (Path *) create_agg_path(root,
+													   grouped_rel,
+													   path,
+													   grouped_rel->reltarget,
+													   AGG_UNIQUE,
+													   AGGSPLIT_SIMPLE,
+													   parse->groupClause,
+													   havingQual,
+													   agg_costs,
+													   dNumGroups));
+
+		if (path != input_rel->cheapest_startup_path)
+		{
+			path = input_rel->cheapest_startup_path;
+			add_path(grouped_rel, (Path *) create_agg_path(root,
+														   grouped_rel,
+														   path,
+														   grouped_rel->reltarget,
+														   AGG_UNIQUE,
+														   AGGSPLIT_SIMPLE,
+														   parse->groupClause,
+														   havingQual,
+														   agg_costs,
+														   dNumGroups));
+		}
+		return;
+	}
 	if (can_sort)
 	{
 		/*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 41110ed888..010266ed4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -761,7 +761,8 @@ typedef enum AggStrategy
 	AGG_PLAIN,					/* simple agg across all input rows */
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
-	AGG_MIXED					/* grouped agg, hash and sort both used */
+	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_UNIQUE					/* grouped agg, the group clause is unique */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 575353d86c..9f0cfaf094 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2475,6 +2475,7 @@ typedef struct JoinPathExtraData
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_INPUT_UNIQUE		0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 55f363f70c..e8300d9f37 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -828,6 +828,7 @@ typedef struct Agg
 	/* Note: planner provides numGroups & aggParams only in HASHED/MIXED case */
 	List	   *groupingSets;	/* grouping sets to use */
 	List	   *chain;			/* chained Agg/Sort nodes */
+	bool		input_unique;   /* The input is unique already */
 } Agg;
 
 /* ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 42bd180895..65912142f8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2544,6 +2544,46 @@ set work_mem to default;
 ----+----+----
 (0 rows)
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+              QUERY PLAN               
+---------------------------------------
+ GroupAggregate
+   Output: pk, sum(b)
+   Group Key: agg_unique_1.pk
+   Input Unique: true
+   ->  Seq Scan on public.agg_unique_1
+         Output: pk, b
+(6 rows)
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ GroupAggregate
+   Output: agg_unique_2.unsortable_col, count(*)
+   Group Key: agg_unique_2.unsortable_col
+   Input Unique: true
+   ->  HashAggregate
+         Output: agg_unique_2.unsortable_col
+         Group Key: agg_unique_2.unsortable_col
+         ->  Seq Scan on public.agg_unique_2
+               Output: agg_unique_2.a, agg_unique_2.unsortable_col
+(9 rows)
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+ unsortable_col | count 
+----------------+-------
+              2 |     1
+              1 |     1
+(2 rows)
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -2552,3 +2592,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 3446c3e9fd..3dc40b13d5 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1159,6 +1159,21 @@ set work_mem to default;
   union all
 (select * from agg_group_4 except select * from agg_hash_4);
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -1167,3 +1182,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
-- 
2.21.0

v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchapplication/octet-stream; name=v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchDownload

From b0374f0e6e20bcef0206411ddeda6a6d1f61e528 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 3 May 2020 22:37:46 +0800
Subject: [PATCH v7 1/6] Introduce RelOptInfo->notnullattrs attribute

---
 src/backend/optimizer/plan/initsplan.c | 10 ++++++++++
 src/backend/optimizer/util/plancat.c   | 10 ++++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 3 files changed, 22 insertions(+)

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..95b1b14cd3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 25545029d7..0b2f9d398a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1c83772d62..907afd383d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patchapplication/octet-stream; name=v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patchDownload

From d720c0b37b28fea3afa66233ab467afeabe9dd77 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 16:04:10 +0800
Subject: [PATCH v7 4/6] Remove distinct node & AggNode if the input is unique
 already.

For the AggNode removal, currently we have to check if there is
no agg function in targetlist.
---
 src/backend/optimizer/plan/planner.c          |  30 +-
 src/test/regress/expected/aggregates.out      |  73 ++--
 src/test/regress/expected/select_distinct.out | 335 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql           |   6 +-
 src/test/regress/sql/select_distinct.sql      | 118 ++++++
 5 files changed, 512 insertions(+), 50 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 654fa1fe35..52fa7c5a45 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3812,7 +3812,27 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
+	bool	group_unique_input = false;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggregation functions.
+		 */
+		group_unique_input = relation_has_uniquekeys_for(root, input_rel,
+														 groupExprs, false);
+		if (group_unique_input &&
+			groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL)
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -4739,6 +4759,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs, false))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4776,10 +4802,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..42bd180895 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1109,29 +1102,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
@@ -1161,12 +1152,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1168,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..eeee17b468 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,338 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+DROP TABLE dist_p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 9480abd577..3446c3e9fd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -394,13 +394,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..19e8a3e4be 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,121 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test Partition TABLE
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- CREATE unqiue INDEX ON dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support CREATE unqiue INDEX ON each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP INDEX dist_p0_uk_bc;
+DROP INDEX dist_p1_uk_bc;
+
+-- uk is same ON all child tables, however it doesn't include the partkey, so NOT ok as well.
+CREATE UNIQUE INDEX dist_p0_uk_c ON dist_p0(c);
+CREATE UNIQUE INDEX dist_p1_uk_c ON dist_p1(c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
-- 
2.21.0

v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patchapplication/octet-stream; name=v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patchDownload

From f63aad8ee45cf410ed7e3cc716287f99bcc72072 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 15:44:15 +0800
Subject: [PATCH v7 3/6] Refactor existing uniqueness related code to use
 UnqiueKey

the call of remove_useless_joins and reduce_unique_semijoins are
postponed due to this as well
---
 .../postgres_fdw/expected/postgres_fdw.out    |  32 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   1 +
 src/backend/optimizer/path/allpaths.c         |  18 ++-
 src/backend/optimizer/plan/analyzejoins.c     | 137 ++++--------------
 src/backend/optimizer/plan/planmain.c         |  13 --
 src/test/regress/expected/join.out            |  59 ++++----
 src/test/regress/sql/join.sql                 |  16 +-
 7 files changed, 97 insertions(+), 179 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..d71b6cc556 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -2902,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 96bbd3d1cf..166e3bd0ba 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..d80bff65d2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -584,39 +586,7 @@ reduce_unique_semijoins(PlannerInfo *root)
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +610,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +892,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..8378936eda 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4411,40 +4411,37 @@ select d.* from d left join (select distinct * from b) s
 (1 row)
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..3312542411 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1512,17 +1512,15 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id and d.b = s.c_id;
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.21.0

v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patchapplication/octet-stream; name=v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patchDownload

From cb41895e5f32a7ca79d604165dcbb5a0ac36d0d9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 3 May 2020 22:51:57 +0800
Subject: [PATCH v7 2/6] Introuduce RelOptInfo.uniquekeys attribute

---
 src/backend/nodes/list.c                    |   31 +
 src/backend/nodes/makefuncs.c               |   21 +
 src/backend/optimizer/path/Makefile         |    3 +-
 src/backend/optimizer/path/README.uniquekey |   71 ++
 src/backend/optimizer/path/allpaths.c       |    7 +
 src/backend/optimizer/path/joinrels.c       |    2 +
 src/backend/optimizer/path/pathkeys.c       |    3 +-
 src/backend/optimizer/path/uniquekeys.c     | 1135 +++++++++++++++++++
 src/backend/optimizer/plan/planner.c        |   10 +-
 src/backend/optimizer/prep/prepunion.c      |    2 +
 src/include/nodes/makefuncs.h               |    3 +
 src/include/nodes/nodes.h                   |    1 +
 src/include/nodes/pathnodes.h               |   26 +
 src/include/nodes/pg_list.h                 |    2 +
 src/include/optimizer/optimizer.h           |    2 +
 src/include/optimizer/paths.h               |   35 +
 16 files changed, 1349 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/optimizer/path/README.uniquekey
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..aa47eaed36 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * Return true iff there is an equal member in target for every
+ * member in members
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index b442b5a29e..3155e5c372 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -812,3 +812,24 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, bool multi_nullvals, bool onerow)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	if (onerow)
+	{
+		Assert(exprs == NIL);
+		Assert(!multi_nullvals);
+	}
+	else
+		Assert(exprs != NIL);
+	ukey->exprs = exprs;
+	ukey->multi_nullvals = multi_nullvals;
+	ukey->onerow = onerow;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/README.uniquekey b/src/backend/optimizer/path/README.uniquekey
new file mode 100644
index 0000000000..dc0d6490f2
--- /dev/null
+++ b/src/backend/optimizer/path/README.uniquekey
@@ -0,0 +1,71 @@
+1. What is UniqueKey?
+We can think UniqueKey is a set of exprs for a RelOptInfo, which we are insure
+that doesn't yields same result among all the rows. The simplest UniqueKey
+format is primary key.
+
+However we define the UnqiueKey as below.
+
+typedef struct UniqueKey
+{
+	NodeTag	type;
+	List	*exprs;
+	bool	multi_nullvals;
+	bool	onerow;
+} UniqueKey;
+
+mutli_nuvals is used to track if the exprs allows multi nullvals. For a unique
+index without its column marked as not null, it allows mulit_nullvals.
+
+onerow is also a kind of UniqueKey which means the RelOptInfo will have 1 row at
+most. it has a stronger semantic than others. like SELECT uk FROM t; uk is
+normal unique key and may have different values.
+SELECT colx FROM t WHERE uk = const.  colx is unique AND we have only 1 value. This
+field can used for innerrel_is_unique. and also be used as an optimization for
+this case. We don't need to maintain multi UniqueKey, we just maintain one with
+onerow = true and exprs = NIL.
+
+onerow is set to true only for 2 cases right now. 1) SELECT .. FROM t WHERE uk =
+1; 2). SELECT aggref(xx) from t; // Without group by.
+
+The UniqueKey can be used at the following cases at least:
+1. remove_useless_joins.
+2. reduce_semianti_joins
+3. remove distinct node if distinct clause is unique.
+4. remove aggnode if group by clause is unique.
+5. Aggregation Push Down without 2 phase aggregation if the join can't
+   duplicated the aggregated rows. (work in progress feature)
+
+
+2. How is it maintained?
+
+We have a set of populate_xxx_unqiuekeys functions to maintain the uniquekey on
+various cases. xxx includes baserel, joinrel, partitionedrel, distinctrel,
+groupedrel, unionrel. and we also need to convert the uniquekey from subquery
+to outer relation, which is what convert_subquery_uniquekeys does.
+
+The most error-prone part is joinrel, we simplified the rules like below:
+1. If the relation's UniqueKey can't be duplicated after join,  then is will be
+   still valid for the join rel. The function we used here is
+   innerrel_keeps_unique. The basic idea is innerrel.any_col = outer.uk.
+
+2. If the UnqiueKey can't keep valid via the rule 1, the combination of the
+   UniqueKey from both sides are valid for sure.  We can prove this as: if the
+   unique exprs from rel1 is duplicated by rel2, the duplicated rows must
+   contains different unique exprs from rel2.
+
+
+More considerations about onerow:
+1. If relation with one row and it can't be duplicated, it is still possible
+   contains mulit_nullvas after outer join.
+2. If the either UniqueKey can be duplicated after join, the can get one row
+   only when both side is one row AND there is no outer join.
+3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
+   row UniqueKey to normal unique key since we don't store exprs for one-row
+   relation. get_exprs_from_uniquekeys will be used here.
+
+
+More considerations about multi_nullvals after join:
+1. If the original UnqiueKey has multi_nullvals, the final UniqueKey will have
+   mulit_nullvals in any case.
+2. If a unique key doesn't allow mulit_nullvals, after some outer join, it
+   allows some outer join.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 255f56b827..96bbd3d1cf 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -786,6 +786,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1276,6 +1279,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2349,6 +2354,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4e1650994d..c0d2332caf 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 21e3f5a987..c56536a0f9 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..957af61819
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1135 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/tlist.h"
+#include "rewrite/rewriteManip.h"
+
+
+static List *gather_mergeable_baserestrictlist(RelOptInfo *rel);
+static List *gather_mergeable_joinclauses(RelOptInfo *joinrel,
+										  RelOptInfo *rel1,
+										  RelOptInfo *rel2,
+										  List *restirctlist,
+										  JoinType jointype);
+static bool match_index_to_baserestrictinfo(IndexOptInfo *unique_ind,
+											List *restrictlist);
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys.
+ *
+ * added_to_joinrel is true if a uniquekey (from outerrel or innerrel)
+ * has been added to joinrel.
+ * useful is true if the exprs of the uniquekey still appears in joinrel.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	bool	added_to_joinrel;
+	bool	useful;
+} *UniqueKeyContext;
+
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel);
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+static List *get_exprs_from_uniquekey(RelOptInfo *joinrel,
+									  RelOptInfo *rel1,
+									  UniqueKey *ukey);
+static bool clause_sides_match_join(RestrictInfo *rinfo,
+									Relids outerrelids,
+									Relids innerrelids);
+static void add_uniquekey_from_index(RelOptInfo *rel,
+									 IndexOptInfo *unique_index);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+static bool add_combined_uniquekey(RelOptInfo *joinrel,
+								   RelOptInfo *outer_rel,
+								   RelOptInfo *inner_rel,
+								   UniqueKey *outer_ukey,
+								   UniqueKey *inner_ukey,
+								   JoinType jointype);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+
+/* Helper function for grouped relation and distinct relation. */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*restrictlist;
+	bool	return_one_row = false;
+	List	*matched_uniq_indexes = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+		/*
+		 * Set UniqueKey on member rel is useless, we have to recompute it at
+		 * upper level, see populate_partitionedrel_uniquekeys for reference
+		 */
+		return;
+
+	restrictlist = gather_mergeable_baserestrictlist(baserel);
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		if (match_index_to_baserestrictinfo(ind, restrictlist))
+		{
+			return_one_row = true;
+			break;
+		}
+
+		if (ind->indexprs != NIL)
+			/* XXX: ignore index on expression so far */
+			continue;
+		matched_uniq_indexes = lappend(matched_uniq_indexes, ind);
+	}
+
+	if (return_one_row)
+	{
+		add_uniquekey_for_onerow(baserel);
+	}
+	else
+	{
+		foreach(lc, matched_uniq_indexes)
+			add_uniquekey_from_index(baserel, lfirst_node(IndexOptInfo, lc));
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can only be used for UniqueKey based on:
+ * 1). It must include partition key.
+ * 2). It exists in all the related. partitions.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_uniq_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *modified_index;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		modified_index = simple_copy_indexinfo_to_parent(rel, ind);
+		/*
+		 * If the unique index doesn't contain partkey, then it is unique
+		 * on this partition only, so it is useless for us.
+		 */
+		if (!index_constains_partkey(rel, modified_index))
+			continue;
+		global_uniq_indexlist = lappend(global_uniq_indexlist,  modified_index);
+	}
+
+	/* Fast path */
+	if (global_uniq_indexlist == NIL)
+		return;
+
+	foreach(lc, childrels)
+	{
+		RelOptInfo *child = lfirst(lc);
+		if (is_first)
+		{
+			is_first = false;
+			continue;
+		}
+		adjust_partition_unique_indexlist(rel, child, &global_uniq_indexlist);
+	}
+
+	/* Now we have a list of unique index which are exactly same on all childrels,
+	 * Set the UniqueKey just like it is non-partition table
+	 */
+	populate_baserel_uniquekeys(root, rel, global_uniq_indexlist);
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid. */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel)
+{
+	Query *parse = root->parse;
+
+	if (parse->hasTargetSRFs)
+		return;
+
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set. */
+	if (parse->groupClause)
+		add_uniquekey_from_sortgroups(root,
+									  grouprel,
+									  root->parse->groupClause);
+	else
+		/* It has aggregation but without a group by, so only one row returned */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquekey.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquekeys
+ */
+void
+populate_unionrel_uniquekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc, unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+	}
+
+	if (exprs == NIL)
+		/* SQL: select union select; is valid, we need to handle it here. */
+		return;
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs,
+												 false, /* allows_multinull */
+												 false  /* onerow */));
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if it's
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed from false to true
+ * for some outer join cases and one-row UniqueKey needs to be converted to nomarl
+ * UniqueKey for the same case as well.
+
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	inner_onerow, outer_onerow;
+
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+	Assert(jointype == JOIN_LEFT || jointype == JOIN_FULL || jointype == JOIN_INNER);
+
+	/* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	inner_onerow = relation_is_onerow(innerrel);
+	outer_onerow = relation_is_onerow(outerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(innerrel);
+
+	clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+											   restrictlist, jointype);
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true /* reverse */))
+	{
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/* Outer relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype == JOIN_FULL.
+			 */
+			if (outer_onerow && jointype != JOIN_FULL)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (outer_onerow)
+			{
+				/* Full join case, the onerow becomes multi rows and multi_nullvals changes
+				 * to true. We also need to set the exprs correctly since it is not one-row
+				 * any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, outerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2),
+																true, /* multi_nullvals */
+																false /* onerow */));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && jointype == JOIN_FULL)
+					/* Change multi_nullvals to true due to the full join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true,
+																ctx->uniquekey->onerow));
+				else
+					/* Just reuse it */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/*
+			 * Inner relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype not in
+			 * (JOIN_FULL, JOIN_LEFT)
+			 */
+			if (inner_onerow && jointype != JOIN_FULL && jointype != JOIN_LEFT)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (inner_onerow)
+			{
+				/* Full join or left outer join case, the inner one row becomes to multi rows
+				 * and multi_nullvals becomes to true. We also need to set the exprs correctly
+				 * since it is not one-row any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, innerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2),
+																true, /* multi_nullvals */
+																false /* onerow */));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals &&
+					(jointype == JOIN_FULL || jointype == JOIN_LEFT))
+					/* Need to change multi_nullvals to true due to the outer join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true,
+																ctx->uniquekey->onerow));
+				else
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/* The combination of the UniqueKey from both sides is unique as well regardless
+	 * of join type, but no bother to add it if its subset has been added to joinrel
+	 * already or it is not useful for the joinrel.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			if (add_combined_uniquekey(joinrel, outerrel, innerrel,
+									   ctx1->uniquekey, ctx2->uniquekey,
+									   jointype))
+				/* If we set a onerow UniqueKey to joinrel, we don't need other. */
+				return;
+		}
+	}
+}
+
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * Covert the UniqueKey in subquery to outer relation.
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	ListCell	*lc;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+
+	Assert(currel->subroot != NULL);
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		ListCell	*lc;
+		List	*exprs = NIL;
+		bool	ukey_useful = true;
+
+		/* One row case is handled above */
+		Assert(ukey->exprs != NIL);
+		foreach(lc, ukey->exprs)
+		{
+			Var *var;
+			TargetEntry *tle = tlist_member(lfirst(lc),
+											currel->subroot->processed_tlist);
+			if (tle == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			var = find_var_for_subquery_tle(currel, tle);
+			if (var == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, var);
+		}
+
+		if (ukey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   ukey->multi_nullvals,
+													   ukey->onerow));
+	}
+}
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's any-column mergeop outrerel's uniquekey
+ * exists in clause_list.
+ *
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	/* Check if there is outerrel's uniquekey in mergeable clause. */
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey.
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	if (ukey->onerow)
+	{
+		/* Some helpful tiny check for UniqueKey. */
+
+		/* 1. We will only store one UniqueKey for this rel. */
+		Assert(list_length(rel->uniquekeys) == 1);
+		/* 2. multi_nullvals must be false. */
+		Assert(!ukey->multi_nullvals);
+		/* 3. exprs must be NIL. */
+		Assert(ukey->exprs == NIL);
+
+	}
+	return ukey->onerow;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel,
+							List *exprs, bool allow_multinulls)
+{
+	ListCell *lc;
+
+	/* For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+	 * so we can't rely on list_is_subset to handle this special cases
+	 */
+	if (exprs == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals && !allow_multinulls)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * Examine the rel's restriction clauses for usable var = const clauses
+ */
+static List*
+gather_mergeable_baserestrictlist(RelOptInfo *rel)
+{
+	List	*restrictlist = NIL;
+	ListCell	*lc;
+	foreach(lc, rel->baserestrictinfo)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
+
+		/*
+		 * Note: can_join won't be set for a restriction clause, but
+		 * mergeopfamilies will be if it has a mergejoinable operator and
+		 * doesn't contain volatile functions.
+		 */
+		if (restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * The clause certainly doesn't refer to anything but the given rel.
+		 * If either side is pseudoconstant then we can use it.
+		 */
+		if (bms_is_empty(restrictinfo->left_relids))
+		{
+			/* righthand side is inner */
+			restrictinfo->outer_is_left = true;
+		}
+		else if (bms_is_empty(restrictinfo->right_relids))
+		{
+			/* lefthand side is inner */
+			restrictinfo->outer_is_left = false;
+		}
+		else
+			continue;
+
+		/* OK, add to list */
+		restrictlist = lappend(restrictlist, restrictinfo);
+	}
+	return restrictlist;
+}
+
+
+/*
+ * gather_mergeable_joinclauses
+ */
+static List*
+gather_mergeable_joinclauses(RelOptInfo *joinrel,
+							 RelOptInfo *outerrel,
+							 RelOptInfo *innerrel,
+							 List *restrictlist,
+							 JoinType jointype)
+{
+	List	*clause_list = NIL;
+	ListCell	*lc;
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *restrictinfo = (RestrictInfo *)lfirst(lc);
+		if (IS_OUTER_JOIN(jointype) &&
+			RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids))
+			continue;
+
+		/* Ignore if it's not a mergejoinable clause */
+		if (!restrictinfo->can_join ||
+			restrictinfo->mergeopfamilies == NIL)
+			continue;			/* not mergejoinable */
+
+		/*
+		 * Check if clause has the form "outer op inner" or "inner op outer",
+		 * and if so mark which side is inner.
+		 */
+		if (!clause_sides_match_join(restrictinfo, outerrel->relids, innerrel->relids))
+			continue;			/* no good for these input relations. */
+
+		/* OK, add to list */
+		clause_list = lappend(clause_list, restrictinfo);
+	}
+	return clause_list;
+}
+
+
+/*
+ * Return true if uk = Const in the restrictlist
+ */
+static bool
+match_index_to_baserestrictinfo(IndexOptInfo *unique_ind, List *restrictlist)
+{
+	int c = 0;
+
+	/* A fast path to avoid the 2 loop scan. */
+	if (list_length(restrictlist) < unique_ind->ncolumns)
+		return false;
+
+	for(c = 0;  c < unique_ind->ncolumns; c++)
+	{
+		ListCell	*lc;
+		bool	found_in_restrictinfo = false;
+		foreach(lc, restrictlist)
+		{
+			RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+			Node	   *rexpr;
+
+			/*
+			 * The condition's equality operator must be a member of the
+			 * index opfamily, else it is not asserting the right kind of
+			 * equality behavior for this index.  We check this first
+			 * since it's probably cheaper than match_index_to_operand().
+			 */
+			if (!list_member_oid(rinfo->mergeopfamilies, unique_ind->opfamily[c]))
+				continue;
+
+			/*
+			 * XXX at some point we may need to check collations here too.
+			 * For the moment we assume all collations reduce to the same
+			 * notion of equality.
+			 */
+
+			/* OK, see if the condition operand matches the index key */
+			if (rinfo->outer_is_left)
+				rexpr = get_rightop(rinfo->clause);
+			else
+				rexpr = get_leftop(rinfo->clause);
+
+			if (match_index_to_operand(rexpr, c, unique_ind))
+			{
+				found_in_restrictinfo = true;
+				break;
+			}
+		}
+		if (!found_in_restrictinfo)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * add_uniquekey_from_index
+ *	We only add the Index Vars whose expr exists in rel->reltarget
+ */
+static void
+add_uniquekey_from_index(RelOptInfo *rel, IndexOptInfo *unique_index)
+{
+	int	pos;
+	List	*exprs = NIL;
+	bool	multi_nullvals = false;
+
+	/* Fast path.
+	 * Check if the indexed columns are used in this relation, if not return fast.
+	 */
+	for(pos = 0; pos < unique_index->ncolumns; pos++)
+	{
+		int attno = unique_index->indexkeys[pos];
+		if (bms_is_empty(rel->attr_needed[attno - rel->min_attr]))
+			/* The indexed column is not needed in this relation. */
+			return;
+		if (!bms_is_member(attno - FirstLowInvalidHeapAttributeNumber,
+						   rel->notnullattrs))
+			multi_nullvals = true;
+	}
+
+	exprs = get_tlist_exprs(unique_index->indextlist, true);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											multi_nullvals,
+											false));
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure that the relation only returns one row, then all the columns
+ * are unique. However we don't need to create UniqueKey for every column, we
+ * just set UniqueKey->onerow to true is OK and leave the exprs = NIL.
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, /* No need to set exprs */
+											   false, /* onerow can't have multi_nullvals */
+											   true));
+
+}
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+/*
+ * clause_sides_match_join
+ *	  Determine whether a join clause is of the right form to use in this join.
+ *
+ * We already know that the clause is a binary opclause referencing only the
+ * rels in the current join.  The point here is to check whether it has the
+ * form "outerrel_expr op innerrel_expr" or "innerrel_expr op outerrel_expr",
+ * rather than mixing outer and inner vars on either side.  If it matches,
+ * we set the transient flag outer_is_left to identify which side is which.
+ */
+static bool
+clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
+						Relids innerrelids)
+{
+	if (bms_is_subset(rinfo->left_relids, outerrelids) &&
+		bms_is_subset(rinfo->right_relids, innerrelids))
+	{
+		/* lefthand side is outer */
+		rinfo->outer_is_left = true;
+		return true;
+	}
+	else if (bms_is_subset(rinfo->left_relids, innerrelids) &&
+			 bms_is_subset(rinfo->right_relids, outerrelids))
+	{
+		/* righthand side is outer */
+		rinfo->outer_is_left = false;
+		return true;
+	}
+	return false;				/* no good for these input relations */
+}
+
+/*
+ * get_exprs_from_uniquekey
+ *	Unify the way of get List of exprs from a one-row UniqueKey or
+ * normal UniqueKey. Return a List of exprs.
+ *
+ * rel1: The relation which you want to get the exprs.
+ * ukey: The UniqueKey you want to get the exprs.
+ */
+static List *
+get_exprs_from_uniquekey(RelOptInfo *joinrel, RelOptInfo *rel1, UniqueKey *ukey)
+{
+	ListCell *lc;
+	bool onerow = rel1 != NULL && relation_is_onerow(rel1);
+
+	List	*res = NIL;
+	Assert(onerow || ukey);
+	if (onerow)
+	{
+		/* Only cares about the exprs still exist in joinrel */
+		foreach(lc, joinrel->reltarget->exprs)
+		{
+			Bitmapset *relids = pull_varnos(lfirst(lc));
+			if (bms_is_subset(relids, rel1->relids))
+			{
+				res = lappend(res, list_make1(lfirst(lc)));
+			}
+		}
+	}
+	else
+	{
+		res = list_make1(ukey->exprs);
+	}
+	return res;
+}
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see
+ * simple_copy_indexinfo_to_parent)
+ */
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(ind1->indextlist, ind2->indextlist);
+}
+
+
+/*
+ * The below macros are used for simple_copy_indexinfo_to_parent which is so
+ * customized that I don't want to put it to copyfuncs.c. So copy it here.
+ */
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent (from partition)
+ * Copy the IndexInfo from child relation to parent relation with some modification,
+ * which is used to test:
+ * 1. If the same index exists in all the childrels.
+ * 2. If the parentrel->reltarget/basicrestrict info matches this index.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/*
+	 * Change relid from partition relid to parent relid so that  the later
+	 * index match work.
+	 */
+	ChangeVarNodes((Node*) newnode->indextlist,
+				   from->rel->relid,
+				   parentrel->relid, 0);
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * global_unique_indexes: At the beginning, it contains the copy & modified
+ * unique index from the first partition. And then check if each index in it still
+ * exists in the following partitions. If no, remove it. at last, it has an
+ * index list which exists in all the partitions.
+ */
+static void
+adjust_partition_unique_indexlist(RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	List	*exprs;
+
+	/* XXX: If there are some vars which is not in current levelsup, the semantic is
+	 * imprecise, should we avoid it? levelsup = 1 is just a demo, maybe we need to
+	 * check every level other than 0, if so, we need write another pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+
+	exprs = get_sortgrouplist_exprs(sortgroups, parse->targetList);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											false, /* sortgroupclause can't be multi_nullvals */
+											false));
+
+}
+
+
+/*
+ * add_combined_uniquekey
+ * The combination of both UniqueKeys is a valid UniqueKey for joinrel no matter
+ * the jointype.
+ * Note: This function is called when either single side of the UniqueKeys is not
+ * valid any more after join.
+ */
+bool
+add_combined_uniquekey(RelOptInfo *joinrel,
+					   RelOptInfo *outer_rel,
+					   RelOptInfo *inner_rel,
+					   UniqueKey *outer_ukey,
+					   UniqueKey *inner_ukey,
+					   JoinType jointype)
+{
+
+	ListCell	*lc1, *lc2;
+
+	/* Either side has multi_nullvals, the combined UniqueKey has multi_nullvals */
+	bool multi_nullvals = outer_ukey->multi_nullvals || inner_ukey->multi_nullvals;
+
+	/* If we have outer join, it implies we will have mutli_nullvals */
+	multi_nullvals = multi_nullvals || IS_OUTER_JOIN(jointype);
+
+	/* The only case we can get onerow joinrel after join */
+	if  (outer_ukey->onerow && inner_ukey->onerow && jointype == JOIN_INNER)
+	{
+		add_uniquekey_for_onerow(joinrel);
+		return true;
+	}
+
+	foreach(lc1, get_exprs_from_uniquekey(joinrel, outer_rel, outer_ukey))
+	{
+		foreach(lc2, get_exprs_from_uniquekey(joinrel, inner_rel, inner_ukey))
+		{
+			List *exprs = list_concat_copy(lfirst_node(List, lc1), lfirst_node(List, lc2));
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														multi_nullvals,
+														false));
+		}
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e664eb18c0..654fa1fe35 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3899,6 +3901,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel);
 	return grouped_rel;
 }
 
@@ -4616,7 +4620,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4913,7 +4917,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5174,6 +5178,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..e94e92937c 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..33f6ddc948 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, bool multi_nullvals, bool onerow);
+
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 907afd383d..575353d86c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -730,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1047,6 +1048,31 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo.
+ * multi_nullvals: true means multi null values may exists in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ * onerow means the related relation return 1 row only. Like filter with unique
+ * index, aggregate without group node, join 2 1-row relations. An optimization
+ * is if the onerow is set to true, we will set not record every expr as a UniqueKey,
+ * we store exprs as a NIL instead.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	bool		multi_nullvals;
+	bool		onerow;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 3e4171056e..9445141263 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -23,6 +23,7 @@
 #define OPTIMIZER_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 /*
  * We don't want to include nodes/pathnodes.h here, because non-planner
@@ -156,6 +157,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..651e48acbb 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -241,4 +241,39 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
 
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel);
+extern void populate_unionrel_uniquekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs,
+										bool allow_multinulls);
+extern bool relation_is_onerow(RelOptInfo *rel);
+
 #endif							/* PATHS_H */
-- 
2.21.0

v7-0006-Join-removal-at-run-time-with-UniqueKey.patchapplication/octet-stream; name=v7-0006-Join-removal-at-run-time-with-UniqueKey.patchDownload

From 24ccf9817a9834001bf00a9fd99b3766fccb5f66 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 7 May 2020 08:36:29 +0800
Subject: [PATCH v7 6/6] Join removal at run-time with UniqueKey.

We delay the join removal at build_join_rel time to get
the benefits of UniqueKey to bypass some limitations of current
remove_useless_join.  However this new strategy can't be an
replacement of the current one since at the runtime it just knows
2 relation.  so it can't handle something like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id)
ON (a.b_id = b.id);

which can be handled by the current strategy.
---
 src/backend/optimizer/path/joinrels.c |   8 +-
 src/backend/optimizer/util/relnode.c  | 263 +++++++++++++++++++++++++-
 src/include/optimizer/pathnode.h      |   4 +-
 src/test/regress/expected/join.out    |  39 ++++
 src/test/regress/sql/join.sql         |  26 +++
 5 files changed, 333 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index c0d2332caf..1e6816d1b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -691,6 +691,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	SpecialJoinInfo sjinfo_data;
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
+	bool	innerrel_removed = false;
 
 	/* We should never try to join two overlapping sets of rels. */
 	Assert(!bms_overlap(rel1->relids, rel2->relids));
@@ -744,7 +745,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	 * goes with this particular joining.
 	 */
 	joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-							 &restrictlist);
+							 &restrictlist, &innerrel_removed);
 
 	/*
 	 * If we've already proven this join is empty, we needn't consider any
@@ -756,9 +757,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 		return joinrel;
 	}
 
+	if (!innerrel_removed)
 	/* Add paths to the join relation. */
-	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-								restrictlist);
+		populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+									restrictlist);
 
 	bms_free(joinrelids);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 591a0a3957..ac086930d6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -22,6 +22,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/inherit.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
@@ -74,6 +75,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
 
+static bool join_canbe_removed(PlannerInfo *root,
+							   SpecialJoinInfo *sjinfo,
+							   RelOptInfo *joinrel,
+							   RelOptInfo *innerrel,
+							   List *restrictlist);
 
 /*
  * setup_simple_rel_arrays
@@ -580,7 +586,8 @@ build_join_rel(PlannerInfo *root,
 			   RelOptInfo *outer_rel,
 			   RelOptInfo *inner_rel,
 			   SpecialJoinInfo *sjinfo,
-			   List **restrictlist_ptr)
+			   List **restrictlist_ptr,
+			   bool *innerrel_removed)
 {
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
@@ -719,6 +726,64 @@ build_join_rel(PlannerInfo *root,
 	 */
 	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
+	if (join_canbe_removed(root, sjinfo,
+						   joinrel, inner_rel,
+						   restrictlist))
+	{
+		ListCell *lc;
+
+		joinrel->rows = outer_rel->rows;
+		joinrel->consider_startup = outer_rel->consider_param_startup;
+		joinrel->consider_param_startup = outer_rel->consider_param_startup;
+		joinrel->consider_parallel = outer_rel->consider_parallel;
+
+		/* Rely on the projection path to reduce the tlist. */
+		joinrel->reltarget = outer_rel->reltarget;
+
+		joinrel->direct_lateral_relids = outer_rel->direct_lateral_relids;
+		joinrel->lateral_relids = outer_rel->lateral_relids;
+
+		joinrel->unique_for_rels = outer_rel->unique_for_rels;
+		joinrel->non_unique_for_rels = outer_rel->non_unique_for_rels;
+		joinrel->baserestrictinfo = outer_rel->baserestrictinfo;
+		joinrel->baserestrictcost = outer_rel->baserestrictcost;
+		joinrel->baserestrict_min_security = outer_rel->baserestrict_min_security;
+		joinrel->uniquekeys = outer_rel->uniquekeys;
+		joinrel->consider_partitionwise_join = outer_rel->consider_partitionwise_join;
+		joinrel->top_parent_relids = outer_rel->top_parent_relids;
+
+		/* Some scan path need to know which base relation to scan, it uses the relid
+		 * field, so we have to use the outerrel->relid.
+		 */
+		joinrel->relid = outer_rel->relid;
+
+		/* Almost the same paths as above, it assert the rte_kind is RTE_RELATION, so
+		 * we need to set as same as outerrel as well
+		 */
+		joinrel->rtekind = RTE_RELATION;
+
+		/* Make sure the path->parent point to current joinrel, can't update it in-place. */
+		foreach(lc, outer_rel->pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_path(joinrel, path);
+		}
+
+		foreach(lc, joinrel->partial_pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_partial_path(joinrel, path);
+		}
+		*innerrel_removed = true;
+	}
+	else
+	{
 	/* Store the partition information. */
 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
 								 sjinfo->jointype);
@@ -747,7 +812,7 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) restrictlist) &&
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
-
+	}
 	/* Add the joinrel to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
@@ -760,11 +825,18 @@ build_join_rel(PlannerInfo *root,
 	if (root->join_rel_level)
 	{
 		Assert(root->join_cur_level > 0);
-		Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+	   // Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
 		root->join_rel_level[root->join_cur_level] =
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	/* elog(INFO, "lev-%d Build JoinRel (%s) with %s and %s, inner is removed: %d", */
+	/*	 root->join_cur_level, */
+	/*	 bmsToString(joinrelids), */
+	/*	 bmsToString(outer_rel->relids), */
+	/*	 bmsToString(inner_rel->relids), */
+	/*	 joinrel->removed); */
+
 	return joinrel;
 }
 
@@ -2028,3 +2100,188 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+static bool
+join_canbe_removed(PlannerInfo *root,
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *joinrel,
+				   RelOptInfo *innerrel,
+				   List *restrictlist)
+{
+	Bitmapset	*vars;
+	List	*exprs = NIL;
+	ListCell	*lc;
+	Bitmapset	*tmp;
+	bool	res;
+
+	if (sjinfo->jointype != JOIN_LEFT)
+		return false;
+
+	if (innerrel->uniquekeys == NIL)
+		return false;
+
+	/*
+	 * Check if there is any innerrel's cols can't be removed.
+	 */
+
+	vars = pull_varnos((Node*)joinrel->reltarget->exprs);
+	tmp = bms_intersect(vars, innerrel->relids);
+	if (!bms_is_empty(tmp))
+		return false;
+
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		if (rinfo->can_join)
+		{
+			if (rinfo->mergeopfamilies != NIL)
+			{
+				if (bms_is_subset(rinfo->left_relids, innerrel->relids))
+					exprs = lappend(exprs, get_leftop(rinfo->clause));
+				else if (bms_is_subset(rinfo->right_relids, innerrel->relids))
+					exprs = lappend(exprs, get_rightop(rinfo->clause));
+				else
+					Assert(false);
+			}
+			else
+				/* Not mergeable join clause, we have to keep it */
+				return false;
+		}
+		else
+		{
+			/*
+			 * If the rinfo is not joinable clause, and it is not pushed down to
+			 * baserelation's basicrestrictinfo. so it must be in ON clauses.
+			 * Example: SELECT .. FROM t1 left join t2 on t1.a = 10;
+			 * In this case we can't remove the inner join as well.
+			 */
+			return false;
+		}
+	}
+	res =  relation_has_uniquekeys_for(root, innerrel, exprs, true);
+	return res;
+}
+
+
+size_t
+size_of_path(Path *path)
+{
+	switch(path->type)
+	{
+		case T_Path:
+			return sizeof(Path);
+		case T_IndexPath:
+			return sizeof(IndexPath);
+		case T_BitmapHeapPath:
+			return sizeof(BitmapHeapPath);
+		case T_TidPath:
+			return sizeof(TidPath);
+		case T_SubqueryScanPath:
+			return sizeof(SubqueryScanPath);
+		case T_ForeignPath:
+			return sizeof(ForeignPath);
+		case T_CustomPath:
+			return sizeof(CustomPath);
+
+
+		case T_NestPath:
+			return sizeof(NestPath);
+
+
+		case T_MergePath:
+			return sizeof(MergePath);
+
+
+		case T_HashPath:
+			return sizeof(HashPath);
+
+
+		case T_AppendPath:
+			return sizeof(AppendPath);
+
+
+		case T_MergeAppendPath:
+			return sizeof(MergeAppendPath);
+
+
+		case T_GroupResultPath:
+			return sizeof(GroupResultPath);
+
+
+		case T_MaterialPath:
+			return sizeof(MaterialPath);
+
+
+		case T_UniquePath:
+			return sizeof(UniquePath);
+
+
+		case T_GatherPath:
+			return sizeof(GatherPath);
+
+
+		case T_GatherMergePath:
+			return sizeof(GatherMergePath);
+
+
+		case T_ProjectionPath:
+			return sizeof(ProjectionPath);
+
+
+		case T_ProjectSetPath:
+			return sizeof(ProjectSetPath);
+
+
+		case T_SortPath:
+			return sizeof(SortPath);
+
+
+		case T_IncrementalSortPath:
+			return sizeof(IncrementalSortPath);
+
+
+		case T_GroupPath:
+			return sizeof(GroupPath);
+
+
+		case T_UpperUniquePath:
+			return sizeof(UpperUniquePath);
+
+
+		case T_AggPath:
+			return sizeof(AggPath);
+
+
+		case T_GroupingSetsPath:
+			return sizeof(GroupingSetsPath);
+
+
+		case T_MinMaxAggPath:
+			return sizeof(MinMaxAggPath);
+
+
+		case T_WindowAggPath:
+			return sizeof(WindowAggPath);
+
+
+		case T_SetOpPath:
+			return sizeof(SetOpPath);
+
+
+		case T_RecursiveUnionPath:
+			return sizeof(RecursiveUnionPath);
+
+
+		case T_LockRowsPath:
+			return sizeof(LockRowsPath);
+		case T_ModifyTablePath:
+			return sizeof(ModifyTablePath);
+		case T_LimitPath:
+			return sizeof(LimitPath);
+		default:
+			elog(ERROR, "unrecognized path type: %s",
+				 nodeToString(&path->type));
+			break;
+	}
+	return 0;
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..254961b2b4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -294,7 +294,8 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
 								  RelOptInfo *outer_rel,
 								  RelOptInfo *inner_rel,
 								  SpecialJoinInfo *sjinfo,
-								  List **restrictlist_ptr);
+								  List **restrictlist_ptr,
+								  bool *innerrel_removed);
 extern Relids min_join_parameterization(PlannerInfo *root,
 										Relids joinrelids,
 										RelOptInfo *outer_rel,
@@ -321,4 +322,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *parent_joinrel, List *restrictlist,
 										SpecialJoinInfo *sjinfo, JoinType jointype);
 
+extern size_t size_of_path(Path *path);
 #endif							/* PATHNODE_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8378936eda..b59c9a73ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6283,3 +6283,42 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+        QUERY PLAN        
+--------------------------
+ Seq Scan on public.m3 t1
+   Output: t1.a
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+drop table m1;
+drop table m2;
+drop table m3;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3312542411..317354547d 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2169,3 +2169,29 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+
+drop table m1;
+drop table m2;
+drop table m3;
-- 
2.21.0

#25

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 5 years ago

In reply to: Andy Fan (#24)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Andy,
Sorry for delay in review. Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.

+/*
+ * Return true iff there is an equal member in target for every
+ * member in members

Suggest reword: return true iff every entry in "members" list is also present
in the "target" list. This function doesn't care about multi-sets, so please
mention that in the prologue clearly.

+
+    if (root->parse->hasTargetSRFs)
+        return;

Why? A relation's uniqueness may be useful well before we work on SRFs.

+
+    if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+        /*
+         * Set UniqueKey on member rel is useless, we have to recompute it at
+         * upper level, see populate_partitionedrel_uniquekeys for reference
+         */
+        return;

Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.

+        /*
+         * If the unique index doesn't contain partkey, then it is unique
+         * on this partition only, so it is useless for us.
+         */

Not really. It could help partitionwise join.

+
+    /* Now we have the unique index list which as exactly same on all
childrels,
+     * Set the UniqueIndex just like it is non-partition table
+     */

I think it's better to annotate each partition with whatever unique index it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

+    /* A Normal group by without grouping set */
+    if (parse->groupClause)
+        add_uniquekey_from_sortgroups(root,
+                                      grouprel,
+                                      root->parse->groupClause);

Those keys which are part of groupClause and also form unique keys in the input
relation, should be recorded as unique keys in group rel. Knowing the minimal
set of keys allows more optimizations.

+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses(). At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer.

+
+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".

+            else if (inner_is_onerow)
+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.

I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

+    }
+    /*
+     * Calculate max_colno in subquery. In fact we can check this with
+     * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+     * is not set on UPPERREL_FINAL relation, so do it this way
+     */

Should/can we use the same logic to convert an expression in the subquery into
a Var of the outer query as in convert_subquery_pathkeys(). If the subquery
doesn't have a reltarget set, we should be able to use reltarget of any of its
paths since all of those should match the positions across subquery and the
outer query.

Will continue reviewing your new set of patches as time permits.

On Thu, May 7, 2020 at 7:02 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

I just uploaded the v7 version and split it into smaller commits for easier
review/merge. I also maintain a up-to-date README.uniquekey
document since something may changed during discussion or later code.

Here is the simple introduction of each commit.

====
1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch

This commit adds the notnullattrs to RelOptInfo, which grabs the information
from both catalog and user's query.

2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch

This commit just add the uniquekeys to RelOptInfo and maintain it at every
stage. However the upper level code is not changed due to this.

Some changes of this part in v7:
1). Removed the UniqueKey.positions attribute. In the past it is used in
convert_subquery_uniquekeys, however we don't need it actually (And I
maintained it wrong in the past). Now I build the relationship between the
outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
2). onerow UniqueKey(exprs = NIL) need to be converted to normal uniquekey(exprs
!= NIL) if it is not one-row any more. This may happen on some outer join.

3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch

Refactor the existing functions like innerrel_is_unique/res_is_distinct_for to
use UniqueKey, and postpone the call of remove_useless_join and
reduce_unique_semijoins to use the new implementation.

4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch

Remove the distinct node if the result is distinct already. Remove the aggnode
if the group by clause is unique already AND there is no aggregation function in
query.

5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch

If the group by clause is unique and query has aggregation function, we use
the AGG_SORT strategy but without really sort since it has only one row in each
group.

6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch

This commit run join removal at build_join_rel. At that time, it can fully uses
unique key. It can handle some more cases, I added some new test cases to
join.sql. However it can be a replacement of the current one. There are some
cases the new strategy can work run well but the current one can. Like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id = b.id);

during the join a & b, the join can't be removed since b.id is still useful in
future. However in the future, we know the b.id can be removed as well, but
it is too late to remove the previous join.

At the implementation part, the main idea is if the join_canbe_removed. we
will copy the pathlist from outerrel to joinrel. There are several items need to
handle.

1. To make sure the overall join_search_one_level, we have to keep the joinrel
even the innerrel is removed (rather than discard the joinrel).
2. If the innerrel can be removed, we don't need to build pathlist for joinrel,
we just reuse the pathlist from outerrel. However there are many places where
use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
have to change the parent to joinrel.
3. During create plan for some path on RTE_RELATION, it needs to know the
relation Oid with path->parent->relid. so we have to use the outerrel->relid
to overwrite the joinrel->relid which is 0 before.
4. Almost same paths as item 3, it usually assert best_path->parent->rtekind ==
RTE_RELATION; now the path may appeared in joinrel, so I used
outerrel->rtekind to overwrite joinrel->rtekind.
5. I guess there are some dependencies between path->pathtarget and
rel->reltarget. since we reuse the pathlist of outerrel, so I used the
outer->reltarget as well. If the join can be removed, I guess the length of
list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
rely on the ProjectionPath to reduce the tlist.

My patches is based on the current latest commit fb544735f1.

Best Regards
Andy Fan

--
Best Wishes,
Ashutosh Bapat

#26

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Ashutosh Bapat (#25)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Ashutosh:

Appreciate for your comments!

On Thu, May 7, 2020 at 7:26 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

Hi Andy,
Sorry for delay in review.

I understand no one has obligation to do that, and it must take
reviewer's time
and more, so really appreciated for it! Hope I can provide more kindly
help like
this in future as well.

Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.

Yes, I just realized the size as well, so I split them to smaller commit

and
each commit and be build and run make check successfully.

All of your comments still valid except the last one
(covert_subquery_uniquekeys)
which has been fixed v7 version.

+/*
+ * Return true iff there is an equal member in target for every
+ * member in members
Suggest reword: return true iff every entry in "members" list is also
present
in the "target" list.

Will do, thanks!

This function doesn't care about multi-sets, so please
mention that in the prologue clearly.
+
+    if (root->parse->hasTargetSRFs)
+        return;
Why? A relation's uniqueness may be useful well before we work on SRFs.

Looks I misunderstand when the SRF function is executed. I will fix this
in v8.

+    if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+        /*
+         * Set UniqueKey on member rel is useless, we have to recompute
it at
+         * upper level, see populate_partitionedrel_uniquekeys for
reference
+         */
+        return;
Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.
+        /*
+         * If the unique index doesn't contain partkey, then it is unique
+         * on this partition only, so it is useless for us.
+         */
Not really. It could help partitionwise join.
+
+    /* Now we have the unique index list which as exactly same on all
childrels,
+     * Set the UniqueIndex just like it is non-partition table
+     */
I think it's better to annotate each partition with whatever unique index
it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

Excellent catch! All the 3 items above is partitionwise join related, I
need some time
to check how to handle that.

+    /* A Normal group by without grouping set */
+    if (parse->groupClause)
+        add_uniquekey_from_sortgroups(root,
+                                      grouprel,
+                                      root->parse->groupClause);
Those keys which are part of groupClause and also form unique keys in the
input
relation, should be recorded as unique keys in group rel. Knowing the
minimal
set of keys allows more optimizations.

This is a very valid point now. I ignored this because I wanted to remove
the AggNode
totally if the part of groupClause is unique, However it doesn't happen
later if there is
aggregation call in this query.

+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }
This should be only possible when it's not UNION ALL. We should add some
assert
or protection for that.

OK, actually I called this function in generate_union_paths. which handle
UNION case only. I will add the Assert anyway.

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel,
innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses().

I didn't realized this before. I will refactor this code accordingly if
necessary
after reading that.

At least from the
first reading of this patch, I get an impression that all these functions
which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful
to
the optimizer.

+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list,
false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel
*/

A joining relation need not be a base rel always, it could be a join rel as
well.

good catch.

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */
Need a comment explaining this. Generally, I feel, this and other
functions in
this file need good comments explaining the logic esp. "why" instead of
"what".

Exactly.

+ else if (inner_is_onerow)

+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.
I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

I guess you may miss the onerow special case if I understand correctly.
inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
innerrel can't be duplicated means: t1.y = t2.pk; so the finally result
is onerow
as well. One of the overall query is SELECT .. FROM t1, t2 where t2.y =
t2.pk;

I explained more about onerow in the v7 README.unqiuekey document, just
copy
it here.

===
1. What is UniqueKey?
....
onerow is also a kind of UniqueKey which means the RelOptInfo will have 1
row at
most. it has a stronger semantic than others. like SELECT uk FROM t; uk is
normal unique key and may have different values.
SELECT colx FROM t WHERE uk = const. colx is unique AND we have only 1
value. This
field can used for innerrel_is_unique. and also be used as an optimization
for
this case. We don't need to maintain multi UniqueKey, we just maintain one
with
onerow = true and exprs = NIL.

onerow is set to true only for 2 cases right now. 1) SELECT .. FROM t WHERE
uk =
1; 2). SELECT aggref(xx) from t; // Without group by.
===

===
2. How is it maintained?
....
More considerations about onerow:
1. If relation with one row and it can't be duplicated, it is still possible
contains mulit_nullvas after outer join.
2. If the either UniqueKey can be duplicated after join, the can get one row
only when both side is one row AND there is no outer join.
3. Whenever the onerow UniqueKey is not a valid any more, we need to
convert one
row UniqueKey to normal unique key since we don't store exprs for one-row
relation. get_exprs_from_uniquekeys will be used here.
===

and 3. in the v7 implementation, the code struct is more clearer:)

+    }
+    /*
+     * Calculate max_colno in subquery. In fact we can check this with
+     * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+     * is not set on UPPERREL_FINAL relation, so do it this way
+     */
Should/can we use the same logic to convert an expression in the subquery
into
a Var of the outer query as in convert_subquery_pathkeys().

Yes, my previous implementation is actually wrong. and should be fixed it
in v7.

If the subquery doesn't have a reltarget set, we should be able to use
reltarget

of any of its paths since all of those should match the positions across

subquery

and the outer query.

but I think it should be rel->subroot->processed_tlist rather than
reltarget? Actually I still
a bit of uneasy about rel->subroot->processed_tlist for some DML case,
which the
processed_tlist is different and I still not figure out its impact.

Will continue reviewing your new set of patches as time permits.

Thank you! Actually there is no big difference between v6 and v7 regarding
the
UniqueKey part except 2 bug fix. However v7 has some more documents,
comments improvement and code refactor/split, which may be helpful
for review. You may try v7 next time if v8 has not come yet:)

Best Regards
Andy Fan

#27

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#26)

7 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

The attached is the v8-patches. The main improvements are based on
Ashutosh's
review (reduce the SRF impact and partition level UniqueKey). I also update
the
README.uniquekey based on the discussion. So anyone who don't want to go
through the long email can read the README.uniquekey first.

===
Just copy some content from the README for easy discussion.

As for inherit table, we maintain the UnqiueKey on childrel as usual. But
for
partitioned table we need to maintain 2 different kinds of UnqiueKey.
1). UniqueKey on the parent relation 2). UniqueKey on child relation for
partition wise query.

Example:
CREATE TABLE p (a int not null, b int not null) partition by list (a);
CREATE TABLE p0 partition of p for values in (1);
CREATE TABLE p1 partition of p for values in (2);

create unique index p0_b on p0(b);
create unique index p1_b on p1(b);

Now b is unique on partition level only, so the distinct can't be removed on
the following cases. SELECT DISTINCT b FROM p; However for query
SELECT DISTINCT b FROM p WHERE a = 1; where only one
partition is chosen, the UniqueKey on child relation is same as the
UniqueKey
on parent relation. The distinct can be removed.

Another usage of UniqueKey on partition level is it be helpful for
partition-wise join.

As for the UniqueKey on parent table level, it comes with 2 different ways.

1). the UniqueKey is also derived in Unique index, but the index must be
same
in all the related children relations and the unique index must contains
Partition Key in it. Example:

CREATE UNIQUE INDEX p_ab ON p(a, b); -- where a is the partition key.

-- Query
SELECT a, b FROM p; -- the (a, b) is a UniqueKey of p.

2). If the parent relation has only one childrel, the UniqueKey on
childrel is
the UniqueKey on parent as well.

The patch structure is not changed, you can see [1]/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com for reference. The
patches is
based on latest commit ac3a4866c0.

[1]: /messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com

Best Regards
Andy Fan

Show quoted text

Attachments:

v8-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patchapplication/octet-stream; name=v8-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patchDownload

From 1ab0af8a74e34a5cb2952f2ad83e66211fd00934 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:54:49 +0800
Subject: [PATCH v8 4/7] Remove distinct node & AggNode if the input is unique
 already.

---
 src/backend/optimizer/plan/planner.c          |  30 +-
 src/test/regress/expected/aggregates.out      |  73 ++-
 src/test/regress/expected/select_distinct.out | 425 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql           |   6 +-
 src/test/regress/sql/select_distinct.sql      | 155 +++++++
 5 files changed, 639 insertions(+), 50 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 142b9782fb..74dcd0ff3d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3812,7 +3812,27 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
+	bool	group_unique_input = false;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggregation functions.
+		 */
+		group_unique_input = relation_has_uniquekeys_for(root, input_rel,
+														 groupExprs, false);
+		if (group_unique_input &&
+			groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL)
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -4739,6 +4759,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs, false))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4776,10 +4802,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 14cdcfcca6..42bd180895 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -870,14 +870,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1036,7 +1034,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1059,10 +1057,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1092,12 +1087,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1109,29 +1102,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
@@ -1161,12 +1152,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1179,14 +1168,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..53cd7d7234 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,428 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP TABLE dist_p;
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p1(d);
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+-- expression index on partition, we didn't convert expression from childrel
+-- to parentrel so we can't handle it.
+CREATE UNIQUE INDEX p_expr_b ON p(date_trunc('hour', b));
+ERROR:  functions in index expression must be marked IMMUTABLE
+CREATE UNIQUE INDEX p_expr_ab ON p(a, date_trunc('hour', b));
+ERROR:  functions in index expression must be marked IMMUTABLE
+ANALYZE p;
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+           QUERY PLAN           
+--------------------------------
+ HashAggregate
+   Group Key: p.d
+   ->  Append
+         ->  Seq Scan on p1 p_1
+         ->  Seq Scan on p2 p_2
+(5 rows)
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Append
+   ->  Bitmap Heap Scan on p1 p_1
+         Recheck Cond: (c = 1)
+         ->  Bitmap Index Scan on p1_a_c_idx
+               Index Cond: (c = 1)
+   ->  Bitmap Heap Scan on p2 p_2
+         Recheck Cond: (c = 1)
+         ->  Bitmap Index Scan on p2_a_c_idx
+               Index Cond: (c = 1)
+(9 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT date_trunc('hour', b) FROM p WHERE a = 1;
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: (date_trunc('hour'::text, (p.b)::timestamp with time zone))
+         ->  Bitmap Heap Scan on p1 p
+               Recheck Cond: (a = 1)
+               ->  Bitmap Index Scan on p1_a_c_idx
+                     Index Cond: (a = 1)
+(7 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, date_trunc('hour', b) FROM p;
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
+ HashAggregate
+   Group Key: p.a, (date_trunc('hour'::text, (p.b)::timestamp with time zone))
+   ->  Append
+         ->  Seq Scan on p1 p_1
+         ->  Seq Scan on p2 p_2
+(5 rows)
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+         QUERY PLAN         
+----------------------------
+ Seq Scan on x
+   Filter: (x.* = '(1)'::x)
+(2 rows)
+
+DROP TABLE x;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 9480abd577..3446c3e9fd 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -394,13 +394,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..d8715bcf56 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,158 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p1(d);
+
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+
+-- expression index on partition, we didn't convert expression from childrel
+-- to parentrel so we can't handle it.
+CREATE UNIQUE INDEX p_expr_b ON p(date_trunc('hour', b));
+CREATE UNIQUE INDEX p_expr_ab ON p(a, date_trunc('hour', b));
+
+ANALYZE p;
+
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT date_trunc('hour', b) FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, date_trunc('hour', b) FROM p;
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+DROP TABLE x;
-- 
2.21.0

v8-0003-Refactor-existing-uniqueness-related-code-to-use-.patchapplication/octet-stream; name=v8-0003-Refactor-existing-uniqueness-related-code-to-use-.patchDownload

From dd6683591227f7cc87e256c657ceb5c79bf889ed Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 15:44:15 +0800
Subject: [PATCH v8 3/7] Refactor existing uniqueness related code to use
 UnqiueKey

the call of remove_useless_joins and reduce_unique_semijoins are
postponed due to this as well
---
 .../postgres_fdw/expected/postgres_fdw.out    |  32 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   1 +
 src/backend/optimizer/path/allpaths.c         |  18 ++-
 src/backend/optimizer/plan/analyzejoins.c     | 137 ++++--------------
 src/backend/optimizer/plan/planmain.c         |  13 --
 src/test/regress/expected/join.out            |  59 ++++----
 src/test/regress/sql/join.sql                 |  16 +-
 7 files changed, 97 insertions(+), 179 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..d71b6cc556 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -2902,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index aaa820d68d..3f854ce161 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..d80bff65d2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -584,39 +586,7 @@ reduce_unique_semijoins(PlannerInfo *root)
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +610,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +892,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..8378936eda 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4411,40 +4411,37 @@ select d.* from d left join (select distinct * from b) s
 (1 row)
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..3312542411 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1512,17 +1512,15 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id and d.b = s.c_id;
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.21.0

v8-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchapplication/octet-stream; name=v8-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchDownload

From 939301ac5db70dbdaf82b6ee03500289f5c1935c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 3 May 2020 22:37:46 +0800
Subject: [PATCH v8 1/7] Introduce RelOptInfo->notnullattrs attribute

---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 255f56b827..44c06cfff2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1002,6 +1002,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1058,6 +1059,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..95b1b14cd3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 25545029d7..0b2f9d398a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1c83772d62..907afd383d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v8-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patchapplication/octet-stream; name=v8-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patchDownload

From ecb14a529bc91a2a806e00f93be6402fef52b879 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 16:32:28 +0800
Subject: [PATCH v8 5/7] If the group by clause is unique and we have
 aggregation function,

We treat the input as sorted without an explicitly sort since each
group has one 1 row.
---
 src/backend/commands/explain.c           |   4 +
 src/backend/executor/nodeAgg.c           |   6 ++
 src/backend/optimizer/plan/createplan.c  |   3 +-
 src/backend/optimizer/plan/planner.c     | 127 +++++++++++++++--------
 src/include/nodes/nodes.h                |   3 +-
 src/include/nodes/pathnodes.h            |   1 +
 src/include/nodes/plannodes.h            |   1 +
 src/test/regress/expected/aggregates.out |  42 ++++++++
 src/test/regress/sql/aggregates.sql      |  17 +++
 9 files changed, 159 insertions(+), 45 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5695802081..a7b38cfc08 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1929,6 +1929,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
+			if (es->format != EXPLAIN_FORMAT_TEXT ||
+				(es->verbose && ((Agg *) plan)->input_unique))
+				ExplainPropertyBool("Input Unique",
+									((Agg *) plan)->input_unique, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 9f4229de60..2d5493c744 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2145,6 +2145,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_UNIQUE:
+				/* AGG_UNIQUE is translated to AGG_SORTED, Handle it here
+				 * to make compiler quiet.
+				 */
+				Assert(false);
+				break;
 		}
 
 		if (!TupIsNull(result))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9941dfe65e..0049d22227 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6353,7 +6353,7 @@ make_agg(List *tlist, List *qual,
 	/* Reduce to long, but 'ware overflow! */
 	numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 
-	node->aggstrategy = aggstrategy;
+	node->aggstrategy = aggstrategy == AGG_UNIQUE ? AGG_SORTED : aggstrategy;
 	node->aggsplit = aggsplit;
 	node->numCols = numGroupCols;
 	node->grpColIdx = grpColIdx;
@@ -6364,6 +6364,7 @@ make_agg(List *tlist, List *qual,
 	node->aggParams = NULL;		/* SS_finalize_plan() will fill this */
 	node->groupingSets = groupingSets;
 	node->chain = chain;
+	node->input_unique = aggstrategy == AGG_UNIQUE;
 
 	plan->qual = qual;
 	plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 74dcd0ff3d..1e72411d51 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3852,51 +3852,61 @@ create_grouping_paths(PlannerInfo *root,
 		int			flags = 0;
 		GroupPathExtraData extra;
 
-		/*
-		 * Determine whether it's possible to perform sort-based
-		 * implementations of grouping.  (Note that if groupClause is empty,
-		 * grouping_is_sortable() is trivially true, and all the
-		 * pathkeys_contained_in() tests will succeed too, so that we'll
-		 * consider every surviving input path.)
-		 *
-		 * If we have grouping sets, we might be able to sort some but not all
-		 * of them; in this case, we need can_sort to be true as long as we
-		 * must consider any sorted-input plan.
-		 */
-		if ((gd && gd->rollups != NIL)
-			|| grouping_is_sortable(parse->groupClause))
-			flags |= GROUPING_CAN_USE_SORT;
+		if (group_unique_input)
+		{
+			/* In this case we don't need to set other flags */
+			Assert(parse->groupClause != NIL);
+			Assert(gd == NULL);
+			flags |= GROUPING_INPUT_UNIQUE;
+		}
+		else
+		{
+			/*
+			 * Determine whether it's possible to perform sort-based
+			 * implementations of grouping.  (Note that if groupClause is empty,
+			 * grouping_is_sortable() is trivially true, and all the
+			 * pathkeys_contained_in() tests will succeed too, so that we'll
+			 * consider every surviving input path.)
+			 *
+			 * If we have grouping sets, we might be able to sort some but not all
+			 * of them; in this case, we need can_sort to be true as long as we
+			 * must consider any sorted-input plan.
+			 */
+			if ((gd && gd->rollups != NIL)
+				|| grouping_is_sortable(parse->groupClause))
+				flags |= GROUPING_CAN_USE_SORT;
 
-		/*
-		 * Determine whether we should consider hash-based implementations of
-		 * grouping.
-		 *
-		 * Hashed aggregation only applies if we're grouping. If we have
-		 * grouping sets, some groups might be hashable but others not; in
-		 * this case we set can_hash true as long as there is nothing globally
-		 * preventing us from hashing (and we should therefore consider plans
-		 * with hashes).
-		 *
-		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
-		 * BY aggregates.  (Doing so would imply storing *all* the input
-		 * values in the hash table, and/or running many sorts in parallel,
-		 * either of which seems like a certain loser.)  We similarly don't
-		 * support ordered-set aggregates in hashed aggregation, but that case
-		 * is also included in the numOrderedAggs count.
-		 *
-		 * Note: grouping_is_hashable() is much more expensive to check than
-		 * the other gating conditions, so we want to do it last.
-		 */
-		if ((parse->groupClause != NIL &&
-			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
-			flags |= GROUPING_CAN_USE_HASH;
+			/*
+			 * Determine whether we should consider hash-based implementations of
+			 * grouping.
+			 *
+			 * Hashed aggregation only applies if we're grouping. If we have
+			 * grouping sets, some groups might be hashable but others not; in
+			 * this case we set can_hash true as long as there is nothing globally
+			 * preventing us from hashing (and we should therefore consider plans
+			 * with hashes).
+			 *
+			 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+			 * BY aggregates.  (Doing so would imply storing *all* the input
+			 * values in the hash table, and/or running many sorts in parallel,
+			 * either of which seems like a certain loser.)  We similarly don't
+			 * support ordered-set aggregates in hashed aggregation, but that case
+			 * is also included in the numOrderedAggs count.
+			 *
+			 * Note: grouping_is_hashable() is much more expensive to check than
+			 * the other gating conditions, so we want to do it last.
+			 */
+			if ((parse->groupClause != NIL &&
+				 agg_costs->numOrderedAggs == 0 &&
+				 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+				flags |= GROUPING_CAN_USE_HASH;
 
-		/*
-		 * Determine whether partial aggregation is possible.
-		 */
-		if (can_partial_agg(root, agg_costs))
-			flags |= GROUPING_CAN_PARTIAL_AGG;
+			/*
+			 * Determine whether partial aggregation is possible.
+			 */
+			if (can_partial_agg(root, agg_costs))
+				flags |= GROUPING_CAN_PARTIAL_AGG;
+		}
 
 		extra.flags = flags;
 		extra.target_parallel_safe = target_parallel_safe;
@@ -6524,9 +6534,40 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		group_input_unique = (extra->flags & GROUPING_INPUT_UNIQUE) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
+	if (group_input_unique)
+	{
+		Path *path = input_rel->cheapest_total_path;
+		add_path(grouped_rel, (Path *) create_agg_path(root,
+													   grouped_rel,
+													   path,
+													   grouped_rel->reltarget,
+													   AGG_UNIQUE,
+													   AGGSPLIT_SIMPLE,
+													   parse->groupClause,
+													   havingQual,
+													   agg_costs,
+													   dNumGroups));
+
+		if (path != input_rel->cheapest_startup_path)
+		{
+			path = input_rel->cheapest_startup_path;
+			add_path(grouped_rel, (Path *) create_agg_path(root,
+														   grouped_rel,
+														   path,
+														   grouped_rel->reltarget,
+														   AGG_UNIQUE,
+														   AGGSPLIT_SIMPLE,
+														   parse->groupClause,
+														   havingQual,
+														   agg_costs,
+														   dNumGroups));
+		}
+		return;
+	}
 	if (can_sort)
 	{
 		/*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 41110ed888..010266ed4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -761,7 +761,8 @@ typedef enum AggStrategy
 	AGG_PLAIN,					/* simple agg across all input rows */
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
-	AGG_MIXED					/* grouped agg, hash and sort both used */
+	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_UNIQUE					/* grouped agg, the group clause is unique */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 575353d86c..9f0cfaf094 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2475,6 +2475,7 @@ typedef struct JoinPathExtraData
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_INPUT_UNIQUE		0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 55f363f70c..e8300d9f37 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -828,6 +828,7 @@ typedef struct Agg
 	/* Note: planner provides numGroups & aggParams only in HASHED/MIXED case */
 	List	   *groupingSets;	/* grouping sets to use */
 	List	   *chain;			/* chained Agg/Sort nodes */
+	bool		input_unique;   /* The input is unique already */
 } Agg;
 
 /* ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 42bd180895..65912142f8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2544,6 +2544,46 @@ set work_mem to default;
 ----+----+----
 (0 rows)
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+              QUERY PLAN               
+---------------------------------------
+ GroupAggregate
+   Output: pk, sum(b)
+   Group Key: agg_unique_1.pk
+   Input Unique: true
+   ->  Seq Scan on public.agg_unique_1
+         Output: pk, b
+(6 rows)
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ GroupAggregate
+   Output: agg_unique_2.unsortable_col, count(*)
+   Group Key: agg_unique_2.unsortable_col
+   Input Unique: true
+   ->  HashAggregate
+         Output: agg_unique_2.unsortable_col
+         Group Key: agg_unique_2.unsortable_col
+         ->  Seq Scan on public.agg_unique_2
+               Output: agg_unique_2.a, agg_unique_2.unsortable_col
+(9 rows)
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+ unsortable_col | count 
+----------------+-------
+              2 |     1
+              1 |     1
+(2 rows)
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -2552,3 +2592,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 3446c3e9fd..3dc40b13d5 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1159,6 +1159,21 @@ set work_mem to default;
   union all
 (select * from agg_group_4 except select * from agg_hash_4);
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -1167,3 +1182,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
-- 
2.21.0

v8-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patchapplication/octet-stream; name=v8-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patchDownload

From b3d75728c33d18dd6403456666ecb47c61b8b8fd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:50:52 +0800
Subject: [PATCH v8 2/7] Introuduce RelOptInfo.uniquekeys attribute

---
 src/backend/nodes/list.c                    |   31 +
 src/backend/nodes/makefuncs.c               |   21 +
 src/backend/optimizer/path/Makefile         |    3 +-
 src/backend/optimizer/path/README.uniquekey |  130 +++
 src/backend/optimizer/path/allpaths.c       |    7 +
 src/backend/optimizer/path/joinpath.c       |    9 +-
 src/backend/optimizer/path/joinrels.c       |    2 +
 src/backend/optimizer/path/pathkeys.c       |    3 +-
 src/backend/optimizer/path/uniquekeys.c     | 1119 +++++++++++++++++++
 src/backend/optimizer/plan/planner.c        |   13 +-
 src/backend/optimizer/prep/prepunion.c      |    2 +
 src/backend/optimizer/util/appendinfo.c     |   44 +
 src/include/nodes/makefuncs.h               |    3 +
 src/include/nodes/nodes.h                   |    1 +
 src/include/nodes/pathnodes.h               |   26 +
 src/include/nodes/pg_list.h                 |    2 +
 src/include/optimizer/appendinfo.h          |    3 +
 src/include/optimizer/optimizer.h           |    2 +
 src/include/optimizer/paths.h               |   43 +
 19 files changed, 1451 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/README.uniquekey
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..a7a99b70f2 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * return true iff every entry in "members" list is also present
+ * in the "target" list.
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index b442b5a29e..3155e5c372 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -812,3 +812,24 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, bool multi_nullvals, bool onerow)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	if (onerow)
+	{
+		Assert(exprs == NIL);
+		Assert(!multi_nullvals);
+	}
+	else
+		Assert(exprs != NIL);
+	ukey->exprs = exprs;
+	ukey->multi_nullvals = multi_nullvals;
+	ukey->onerow = onerow;
+	return ukey;
+}
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/README.uniquekey b/src/backend/optimizer/path/README.uniquekey
new file mode 100644
index 0000000000..965eb3ffce
--- /dev/null
+++ b/src/backend/optimizer/path/README.uniquekey
@@ -0,0 +1,130 @@
+1. What is UniqueKey?
+We can think UniqueKey is a set of exprs for a RelOptInfo, which we are insure
+that doesn't yields same result among all the rows. The simplest UniqueKey
+format is primary key.
+
+However we define the UnqiueKey as below.
+
+typedef struct UniqueKey
+{
+        NodeTag	type;
+        List	*exprs;
+        bool	multi_nullvals;
+        bool	onerow;
+} UniqueKey;
+
+mutli_nuvals is used to track if the exprs allows multi nullvals. For a unique
+index without its column marked as not null, it allows mulit_nullvals.
+
+onerow is also a kind of UniqueKey which means the RelOptInfo will have 1 row at
+most. it has a stronger semantic than others. like SELECT uk FROM t; uk is
+normal unique key and may have different values.
+SELECT colx FROM t WHERE uk = const.  colx is unique AND we have only 1 value. This
+field can used for innerrel_is_unique. and also be used as an optimization for
+this case. We don't need to maintain multi UniqueKey, we just maintain one with
+onerow = true and exprs = NIL.
+
+onerow is set to true only for 2 cases right now. 1) SELECT .. FROM t WHERE uk =
+1; 2). SELECT aggref(xx) from t; // Without group by.
+
+The UniqueKey can be used at the following cases at least:
+1. remove_useless_joins.
+2. reduce_semianti_joins
+3. remove distinct node if distinct clause is unique.
+4. remove aggnode if group by clause is unique.
+5. Aggregation Push Down without 2 phase aggregation if the join can't
+   duplicated the aggregated rows. (work in progress feature)
+
+
+2. How is it maintained?
+
+We have a set of populate_xxx_unqiuekeys functions to maintain the uniquekey on
+various cases. xxx includes baserel, joinrel, partitionedrel, distinctrel,
+groupedrel, unionrel. and we also need to convert the uniquekey from subquery
+to outer relation, which is what convert_subquery_uniquekeys does.
+
+The first part is about baserel. We handled 3 cases. suppose we have Unique
+Index on (a, b).
+
+1. SELECT a, b FROM t.  UniqueKey (a, b)
+2. SELECT a FROM t WHERE b = 1;  UniqueKey (a)
+3. SELECT .. FROM t WHERE a = 1 AND b = 1;  UniqueKey (NIL).  onerow case, every
+   column is Unique.
+
+The next part is joinrel, this part is most error-prone, we simplified the rules
+like below:
+1. If the relation's UniqueKey can't be duplicated after join,  then is will be
+   still valid for the join rel. The function we used here is
+   innerrel_keeps_unique. The basic idea is innerrel.any_col = outer.uk.
+
+2. If the UnqiueKey can't keep valid via the rule 1, the combination of the
+   UniqueKey from both sides are valid for sure.  We can prove this as: if the
+   unique exprs from rel1 is duplicated by rel2, the duplicated rows must
+   contains different unique exprs from rel2.
+
+More considerations about onerow:
+1. If relation with one row and it can't be duplicated, it is still possible
+   contains mulit_nullvas after outer join.
+2. If the either UniqueKey can be duplicated after join, the can get one row
+   only when both side is one row AND there is no outer join.
+3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
+   row UniqueKey to normal unique key since we don't store exprs for one-row
+   relation. get_exprs_from_uniquekeys will be used here.
+
+
+More considerations about multi_nullvals after join:
+1. If the original UnqiueKey has multi_nullvals, the final UniqueKey will have
+   mulit_nullvals in any case.
+2. If a unique key doesn't allow mulit_nullvals, after some outer join, it
+   allows some outer join.
+
+
+When we comes to subquery, we need to convert_subquery_unqiuekeys just like
+convert_subquery_pathkeys.  Only the UniqueKey insides subquery is referenced as
+a Var in outer relation will be reused. The relationship between the outerrel.Var
+and subquery.exprs is built with outerel->subroot->processed_tlist.
+
+
+As for the SRF functions, it will break the uniqueness of uniquekey, However it
+is handled in adjust_paths_for_srfs, which happens after the query_planner. so
+we will maintain the UniqueKey until there and reset it to NIL at that
+places. This can't help on distinct/group by elimination cases but probably help
+in some other cases, like reduce_unqiue_semijoins/remove_useless_joins and it is
+semantic correctly.
+
+
+As for inherit table, we first main the UnqiueKey on childrel as well. But for
+partitioned table we need to maintain 2 different kinds of
+UnqiueKey. 1). UniqueKey on the parent relation 2). UniqueKey on child
+relation for partition wise query.
+
+Example:
+CREATE TABLE p (a int not null, b int not null) partition by list (a);
+CREATE TABLE p0 partition of p for values in (1);
+CREATE TABLE p1 partition of p for values in (2);
+
+create unique index p0_b on p0(b);
+create unique index p1_b on p1(b);
+
+Now b is only unique on partition level, so the distinct can't be removed on
+the following cases. SELECT DISTINCT b FROM p;
+
+Another example is SELECT DISTINCT a, b FROM p WHERE a = 1; Since only one
+partition is chosen, the UniqueKey on child relation is same as the UniqueKey on
+parent relation.
+
+Another usage of UniqueKey on partition level is it be helpful for
+partition-wise join.
+
+As for the UniqueKey on parent table level, it comes with 2 different ways,
+1). the UniqueKey is also derived in UniqueKey index, but the index must be same
+in all the related children relations and the unique index must contains
+Partition Key in it. Example:
+
+CREATE UNIQUE INDEX p_ab ON p(a, b);  -- where a is the partition key.
+
+-- Query
+SELECT a, b FROM p; the (a, b) is a UniqueKey of p.
+
+ 2). If the parent relation has only one childrel, the UniqueKey on childrel is
+ the UniqueKey on parent as well.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 44c06cfff2..aaa820d68d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -786,6 +786,9 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 
 	/* Consider TID scans */
 	create_tidscan_paths(root, rel);
+
+	/* Set UniqueKeys for this relation */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
 }
 
 /*
@@ -1307,6 +1310,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2380,6 +2385,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..ef0fd2fb0b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -71,13 +71,6 @@ static void consider_parallel_mergejoin(PlannerInfo *root,
 static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
-static List *select_mergejoin_clauses(PlannerInfo *root,
-									  RelOptInfo *joinrel,
-									  RelOptInfo *outerrel,
-									  RelOptInfo *innerrel,
-									  List *restrictlist,
-									  JoinType jointype,
-									  bool *mergejoin_allowed);
 static void generate_mergejoin_paths(PlannerInfo *root,
 									 RelOptInfo *joinrel,
 									 RelOptInfo *innerrel,
@@ -1927,7 +1920,7 @@ hash_inner_and_outer(PlannerInfo *root,
  * if it is mergejoinable and involves vars from the two sub-relations
  * currently of interest.
  */
-static List *
+List *
 select_mergejoin_clauses(PlannerInfo *root,
 						 RelOptInfo *joinrel,
 						 RelOptInfo *outerrel,
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4e1650994d..c0d2332caf 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 21e3f5a987..c56536a0f9 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..99faad2081
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1119 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/appendinfo.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/tlist.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys.
+ *
+ * added_to_joinrel is true if a uniquekey (from outerrel or innerrel)
+ * has been added to joinrel.
+ * useful is true if the exprs of the uniquekey still appears in joinrel.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	bool	added_to_joinrel;
+	bool	useful;
+} *UniqueKeyContext;
+
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel);
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+
+static List *get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+										List *const_exprs,
+										List *const_expr_opfamilies,
+										Bitmapset *used_varattrs,
+										bool *useful,
+										bool *multi_nullvals);
+static List *get_exprs_from_uniquekey(RelOptInfo *joinrel,
+									  RelOptInfo *rel1,
+									  UniqueKey *ukey);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+static bool add_combined_uniquekey(RelOptInfo *joinrel,
+								   RelOptInfo *outer_rel,
+								   RelOptInfo *inner_rel,
+								   UniqueKey *outer_ukey,
+								   UniqueKey *inner_ukey,
+								   JoinType jointype);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(PlannerInfo *root,
+													 RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(PlannerInfo *root,
+											  RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+
+/* Helper function for grouped relation and distinct relation. */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * add baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*matched_uniq_indexes = NIL;
+
+	/* Attrs appears in rel->reltarget->exprs. */
+	Bitmapset *used_attrs = NULL;
+
+	List	*const_exprs = NIL;
+	List	*expr_opfamilies = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+		matched_uniq_indexes = lappend(matched_uniq_indexes, ind);
+	}
+
+	if (matched_uniq_indexes  == NIL)
+		return;
+
+	/* Check which attrs is used in baserel->reltarget */
+	pull_varattnos((Node *)baserel->reltarget->exprs, baserel->relid, &used_attrs);
+
+	/* Check which attrno is used at a mergeable const filter */
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+		{
+			const_exprs = lappend(const_exprs, get_rightop(rinfo->clause));
+		}
+		else if (bms_is_empty(rinfo->right_relids))
+		{
+			const_exprs = lappend(const_exprs, get_leftop(rinfo->clause));
+		}
+		else
+			continue;
+
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, matched_uniq_indexes)
+	{
+		bool	multi_nullvals, useful;
+		List	*exprs = get_exprs_from_uniqueindex(lfirst_node(IndexOptInfo, lc),
+													const_exprs,
+													expr_opfamilies,
+													used_attrs,
+													&useful,
+													&multi_nullvals);
+		if (useful)
+		{
+			if (exprs == NIL)
+			{
+				/* All the columns in Unique Index matched with a restrictinfo */
+				add_uniquekey_for_onerow(baserel);
+				return;
+			}
+			baserel->uniquekeys = lappend(baserel->uniquekeys,
+										  makeUniqueKey(exprs, multi_nullvals, false));
+		}
+	}
+}
+
+
+/*
+ * populate_partitioned_rel_uniquekeys
+ * The unique index can only be used for UniqueKey based on:
+ * 1). It must include partition key.
+ * 2). It exists in all the related. partitions.
+ */
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_uniq_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (childrels == NIL)
+		return;
+
+	childrel = linitial_node(RelOptInfo, childrels);
+	foreach(lc, childrel->indexlist)
+	{
+		IndexOptInfo *ind = lfirst(lc);
+		IndexOptInfo *modified_index;
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+
+		/*
+		 * During simple_copy_indexinfo_to_parent, we need to convert var from
+		 * child var to parent var, index on expression is too complex to handle.
+		 * so ignore it for now.
+		 */
+		if (ind->indexprs != NIL)
+			continue;
+
+		modified_index = simple_copy_indexinfo_to_parent(root, rel, ind);
+		/*
+		 * If the unique index doesn't contain partkey, then it is unique
+		 * on this partition only, so it is useless for us.
+		 */
+		if (!index_constains_partkey(rel, modified_index))
+			continue;
+
+		global_uniq_indexlist = lappend(global_uniq_indexlist,  modified_index);
+	}
+
+	if (global_uniq_indexlist != NIL)
+	{
+		foreach(lc, childrels)
+		{
+			RelOptInfo *child = lfirst(lc);
+			if (is_first)
+			{
+				is_first = false;
+				continue;
+			}
+			adjust_partition_unique_indexlist(root, rel, child, &global_uniq_indexlist);
+		}
+		/* Now we have a list of unique index which are exactly same on all childrels,
+		 * Set the UniqueKey just like it is non-partition table
+		 */
+		populate_baserel_uniquekeys(root, rel, global_uniq_indexlist);
+	}
+
+	/*
+	 * If there is only one partition used in this query, the UniqueKey is childrel is
+	 * still valid in parent level.
+	 */
+	if (list_length(childrels) == 1)
+	{
+		RelOptInfo *childrel = linitial_node(RelOptInfo, childrels);
+		ListCell	*lc;
+		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+		if (relation_is_onerow(childrel))
+		{
+			add_uniquekey_for_onerow(rel);
+			return;
+		}
+
+		/*
+		 * XXX: this code is called at set_append_rel_pathlist for partitioned relation.
+		 * is it possible that join FIRST and build append relation in partition-wise
+		 * join? if yes, the UniqueKey may comes from 1+ relations, so we need tell which
+		 * expr are come from partitioned relation. I suppose that doesn't happen now.
+		 */
+		Assert(bms_num_members(childrel->relids) == 1);
+		foreach(lc, childrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			ListCell	*lc2;
+			AppendRelInfo *appinfo = find_appinfo_by_child(root, childrel->relid);
+			List *parent_exprs = NIL;
+			bool can_reuse = true;
+			foreach(lc2, ukey->exprs)
+			{
+				Var *var = (Var *)lfirst(lc2);
+				/*
+				 * If the var comes from a expression, it is hard to build the expression
+				 * in parent so ignore that case for now.
+				 */
+				if(!IsA(var, Var))
+				{
+					can_reuse = false;
+					break;
+				}
+				/* Convert it to parent var */
+				parent_exprs = lappend(parent_exprs, find_parent_var(appinfo, var));
+			}
+			if (can_reuse)
+				rel->uniquekeys = lappend(rel->uniquekeys,
+										  makeUniqueKey(parent_exprs,
+														ukey->multi_nullvals,
+														ukey->onerow));
+		}
+	}
+}
+
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid. */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel,
+							 RelOptInfo *inputrel)
+
+{
+	Query *parse = root->parse;
+	bool input_ukey_added = false;
+	ListCell *lc;
+
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set. */
+	if (parse->groupClause)
+	{
+		/* Current even the groupby clause is Unique already, and if query has aggref
+		 * We have to create grouprel as well. To keep the UnqiueKey->exprs shortest,
+		 * we will check if the UniqueKey of input_rel still valid, if so, we reuse it
+		 */
+		foreach(lc, inputrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			if (ukey->onerow)
+			{
+				add_uniquekey_for_onerow(grouprel);
+				return;
+			}
+			if (list_is_subset(ukey->exprs, grouprel->reltarget->exprs))
+			{
+				grouprel->uniquekeys = lappend(grouprel->uniquekeys,
+											   ukey);
+				input_ukey_added = true;
+			}
+		}
+		if (!input_ukey_added)
+			add_uniquekey_from_sortgroups(root,
+										  grouprel,
+										  root->parse->groupClause);
+	}
+	else
+		/* It has aggregation but without a group by, so only one row returned */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquekey.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquekeys
+ */
+void
+populate_unionrel_uniquekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc, unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+	}
+
+	if (exprs == NIL)
+		/* SQL: select union select; is valid, we need to handle it here. */
+		return;
+	unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+								   makeUniqueKey(exprs,
+												 false, /* allows_multinull */
+												 false  /* onerow */));
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if it's
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed from false to true
+ * for some outer join cases and one-row UniqueKey needs to be converted to nomarl
+ * UniqueKey for the same case as well.
+
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	inner_onerow, outer_onerow;
+	bool	mergejoin_allowed;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+	Assert(jointype == JOIN_LEFT || jointype == JOIN_FULL || jointype == JOIN_INNER);
+
+	/* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	inner_onerow = relation_is_onerow(innerrel);
+	outer_onerow = relation_is_onerow(outerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(innerrel);
+
+	clause_list = select_mergejoin_clauses(root, joinrel, outerrel, innerrel,
+										   restrictlist, jointype,
+										   &mergejoin_allowed);
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true /* reverse */))
+	{
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/* Outer relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype == JOIN_FULL.
+			 */
+			if (outer_onerow && jointype != JOIN_FULL)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (outer_onerow)
+			{
+				/* Full join case, the onerow becomes multi rows and multi_nullvals changes
+				 * to true. We also need to set the exprs correctly since it is not one-row
+				 * any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, outerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2),
+																true, /* multi_nullvals */
+																false /* onerow */));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && jointype == JOIN_FULL)
+					/* Change multi_nullvals to true due to the full join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true,
+																ctx->uniquekey->onerow));
+				else
+					/* Just reuse it */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/*
+			 * Inner relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype not in
+			 * (JOIN_FULL, JOIN_LEFT)
+			 */
+			if (inner_onerow && jointype != JOIN_FULL && jointype != JOIN_LEFT)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (inner_onerow)
+			{
+				/* Full join or left outer join case, the inner one row becomes to multi rows
+				 * and multi_nullvals becomes to true. We also need to set the exprs correctly
+				 * since it is not one-row any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, innerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2),
+																true, /* multi_nullvals */
+																false /* onerow */));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals &&
+					(jointype == JOIN_FULL || jointype == JOIN_LEFT))
+					/* Need to change multi_nullvals to true due to the outer join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true,
+																ctx->uniquekey->onerow));
+				else
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/* The combination of the UniqueKey from both sides is unique as well regardless
+	 * of join type, but no bother to add it if its subset has been added to joinrel
+	 * already or it is not useful for the joinrel.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			if (add_combined_uniquekey(joinrel, outerrel, innerrel,
+									   ctx1->uniquekey, ctx2->uniquekey,
+									   jointype))
+				/* If we set a onerow UniqueKey to joinrel, we don't need other. */
+				return;
+		}
+	}
+}
+
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * Covert the UniqueKey in subquery to outer relation.
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	ListCell	*lc;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+
+	Assert(currel->subroot != NULL);
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		ListCell	*lc;
+		List	*exprs = NIL;
+		bool	ukey_useful = true;
+
+		/* One row case is handled above */
+		Assert(ukey->exprs != NIL);
+		foreach(lc, ukey->exprs)
+		{
+			Var *var;
+			TargetEntry *tle = tlist_member(lfirst(lc),
+											currel->subroot->processed_tlist);
+			if (tle == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			var = find_var_for_subquery_tle(currel, tle);
+			if (var == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, var);
+		}
+
+		if (ukey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   ukey->multi_nullvals,
+													   ukey->onerow));
+	}
+}
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's any-column mergeop outrerel's uniquekey
+ * exists in clause_list.
+ *
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	/* Check if there is outerrel's uniquekey in mergeable clause. */
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey.
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	if (ukey->onerow)
+	{
+		/* Some helpful tiny check for UniqueKey. */
+
+		/* 1. We will only store one UniqueKey for this rel. */
+		Assert(list_length(rel->uniquekeys) == 1);
+		/* 2. multi_nullvals must be false. */
+		Assert(!ukey->multi_nullvals);
+		/* 3. exprs must be NIL. */
+		Assert(ukey->exprs == NIL);
+
+	}
+	return ukey->onerow;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel,
+							List *exprs, bool allow_multinulls)
+{
+	ListCell *lc;
+
+	/* For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+	 * so we can't rely on list_is_subset to handle this special cases
+	 */
+	if (exprs == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals && !allow_multinulls)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * get_exprs_from_uniqueindex
+ *
+ * Return a list of exprs which is unique. set useful to false if this
+ * unique index is not useful for us.
+ */
+static List *
+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+							List *const_exprs,
+							List *const_expr_opfamilies,
+							Bitmapset *used_varattrs,
+							bool *useful,
+							bool *multi_nullvals)
+{
+	List	*exprs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+
+	*useful = true;
+	*multi_nullvals = false;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+
+		/* Index on system column is not supported */
+		Assert(attr >= 0);
+
+		if(attr > 0)
+		{
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+		}
+		else if (attr == 0)
+		{
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+		}
+
+		/* The related column is equal to Const using the index */
+		forboth(lc1, const_exprs, lc2, const_expr_opfamilies)
+		{
+			if (list_member_oid((List *)lfirst(lc2), unique_index->opfamily[c])
+				&& match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		attr -= FirstLowInvalidHeapAttributeNumber;
+
+		if (!bms_is_member(attr, used_varattrs))
+		{
+			*useful = false;
+			break;
+		}
+
+		/* attr = 0 means whole row */
+		if (!bms_is_member(attr, unique_index->rel->notnullattrs)
+			&& !bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
+							  unique_index->rel->notnullattrs))
+		{
+			*multi_nullvals = true;
+		}
+
+		exprs = lappend(exprs, expr);
+	}
+	return exprs;
+}
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure that the relation only returns one row, then all the columns
+ * are unique. However we don't need to create UniqueKey for every column, we
+ * just set UniqueKey->onerow to true is OK and leave the exprs = NIL.
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, /* No need to set exprs */
+											   false, /* onerow can't have multi_nullvals */
+											   true));
+
+}
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+
+/*
+ * get_exprs_from_uniquekey
+ *	Unify the way of get List of exprs from a one-row UniqueKey or
+ * normal UniqueKey. Return a List of exprs.
+ *
+ * rel1: The relation which you want to get the exprs.
+ * ukey: The UniqueKey you want to get the exprs.
+ */
+static List *
+get_exprs_from_uniquekey(RelOptInfo *joinrel, RelOptInfo *rel1, UniqueKey *ukey)
+{
+	ListCell *lc;
+	bool onerow = rel1 != NULL && relation_is_onerow(rel1);
+
+	List	*res = NIL;
+	Assert(onerow || ukey);
+	if (onerow)
+	{
+		/* Only cares about the exprs still exist in joinrel */
+		foreach(lc, joinrel->reltarget->exprs)
+		{
+			Bitmapset *relids = pull_varnos(lfirst(lc));
+			if (bms_is_subset(relids, rel1->relids))
+			{
+				res = lappend(res, list_make1(lfirst(lc)));
+			}
+		}
+	}
+	else
+	{
+		res = list_make1(ukey->exprs);
+	}
+	return res;
+}
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/* index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see
+ * simple_copy_indexinfo_to_parent)
+ */
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(get_tlist_exprs(ind1->indextlist, true),
+			  get_tlist_exprs(ind2->indextlist, true));
+}
+
+
+/*
+ * The below macros are used for simple_copy_indexinfo_to_parent which is so
+ * customized that I don't want to put it to copyfuncs.c. So copy it here.
+ */
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent (from partition)
+ * Copy the IndexInfo from child relation to parent relation with some modification,
+ * which is used to test:
+ * 1. If the same index exists in all the childrels.
+ * 2. If the parentrel->reltarget/basicrestrict info matches this index.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(PlannerInfo *root,
+								RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+	AppendRelInfo *appinfo = find_appinfo_by_child(root, from->rel->relid);
+	ListCell	*lc;
+	int	idx = 0;
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/* Convert index exprs on child expr to expr on parent */
+	foreach(lc, newnode->indextlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		/* Index on expression is ignored */
+		Assert(IsA(tle->expr, Var));
+		tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr);
+		newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno;
+		idx++;
+	}
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * global_unique_indexes: At the beginning, it contains the copy & modified
+ * unique index from the first partition. And then check if each index in it still
+ * exists in the following partitions. If no, remove it. at last, it has an
+ * index list which exists in all the partitions.
+ */
+static void
+adjust_partition_unique_indexlist(PlannerInfo *root,
+								  RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(root, parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+		if (!found_in_child)
+			/* There is no same index on other childrel, remove it */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	List	*exprs;
+
+	/* XXX: If there are some vars which is not in current levelsup, the semantic is
+	 * imprecise, should we avoid it? levelsup = 1 is just a demo, maybe we need to
+	 * check every level other than 0, if so, we need write another pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+
+	exprs = get_sortgrouplist_exprs(sortgroups, parse->targetList);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											false, /* sortgroupclause can't be multi_nullvals */
+											false));
+
+}
+
+
+/*
+ * add_combined_uniquekey
+ * The combination of both UniqueKeys is a valid UniqueKey for joinrel no matter
+ * the jointype.
+ * Note: This function is called when either single side of the UniqueKeys is not
+ * valid any more after join.
+ */
+bool
+add_combined_uniquekey(RelOptInfo *joinrel,
+					   RelOptInfo *outer_rel,
+					   RelOptInfo *inner_rel,
+					   UniqueKey *outer_ukey,
+					   UniqueKey *inner_ukey,
+					   JoinType jointype)
+{
+
+	ListCell	*lc1, *lc2;
+
+	/* Either side has multi_nullvals, the combined UniqueKey has multi_nullvals */
+	bool multi_nullvals = outer_ukey->multi_nullvals || inner_ukey->multi_nullvals;
+
+	/* If we have outer join, it implies we will have mutli_nullvals */
+	multi_nullvals = multi_nullvals || IS_OUTER_JOIN(jointype);
+
+	/* The only case we can get onerow joinrel after join */
+	if  (outer_ukey->onerow && inner_ukey->onerow && jointype == JOIN_INNER)
+	{
+		add_uniquekey_for_onerow(joinrel);
+		return true;
+	}
+
+	foreach(lc1, get_exprs_from_uniquekey(joinrel, outer_rel, outer_ukey))
+	{
+		foreach(lc2, get_exprs_from_uniquekey(joinrel, inner_rel, inner_ukey))
+		{
+			List *exprs = list_concat_copy(lfirst_node(List, lc1), lfirst_node(List, lc2));
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														multi_nullvals,
+														false));
+		}
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e664eb18c0..142b9782fb 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3899,6 +3901,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel, input_rel);
 	return grouped_rel;
 }
 
@@ -4616,7 +4620,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4913,7 +4917,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5174,6 +5178,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
@@ -6051,6 +6057,9 @@ adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 	if (list_length(targets) == 1)
 		return;
 
+	/* UniqueKey is not valid after handling the SRF. */
+	rel->uniquekeys = NIL;
+
 	/*
 	 * Stack SRF-evaluation nodes atop each path for the rel.
 	 *
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..e94e92937c 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index d722063cf3..44c37ecffc 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -746,3 +746,47 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
 	}
 	return appinfos;
 }
+
+/*
+ * find_appinfo_by_child
+ *
+ */
+AppendRelInfo *
+find_appinfo_by_child(PlannerInfo *root, Index child_index)
+{
+	ListCell	*lc;
+	foreach(lc, root->append_rel_list)
+	{
+		AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
+		if (appinfo->child_relid == child_index)
+			return appinfo;
+	}
+	elog(ERROR, "parent relation cant be found");
+	return NULL;
+}
+
+/*
+ * find_parent_var
+ *
+ */
+Var *
+find_parent_var(AppendRelInfo *appinfo, Var *child_var)
+{
+	ListCell	*lc;
+	Var	*res = NULL;
+	Index attno = 1;
+	foreach(lc, appinfo->translated_vars)
+	{
+		Node *child_node = lfirst(lc);
+		if (equal(child_node, child_var))
+		{
+			res = copyObject(child_var);
+			res->varattno = attno;
+			res->varno = appinfo->parent_relid;
+		}
+		attno++;
+	}
+	if (res == NULL)
+		elog(ERROR, "parent var can't be found.");
+	return res;
+}
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..33f6ddc948 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, bool multi_nullvals, bool onerow);
+
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 907afd383d..575353d86c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -730,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1047,6 +1048,31 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo.
+ * multi_nullvals: true means multi null values may exists in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ * onerow means the related relation return 1 row only. Like filter with unique
+ * index, aggregate without group node, join 2 1-row relations. An optimization
+ * is if the onerow is set to true, we will set not record every expr as a UniqueKey,
+ * we store exprs as a NIL instead.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	bool		multi_nullvals;
+	bool		onerow;
+} UniqueKey;
+
 /*
  * PathTarget
  *
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index d6a27a60dd..e87c92a054 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -32,4 +32,7 @@ extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
 											   Relids relids, int *nappinfos);
 
+extern AppendRelInfo *find_appinfo_by_child(PlannerInfo *root, Index child_index);
+extern Var *find_parent_var(AppendRelInfo *appinfo, Var *child_var);
+
 #endif							/* APPENDINFO_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 3e4171056e..9445141263 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -23,6 +23,7 @@
 #define OPTIMIZER_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 /*
  * We don't want to include nodes/pathnodes.h here, because non-planner
@@ -156,6 +157,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..9217a8d6c6 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,5 +240,48 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   int strategy, bool nulls_first);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel,
+										 RelOptInfo *inputrel);
+extern void populate_unionrel_uniquekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs,
+										bool allow_multinulls);
+extern bool relation_is_onerow(RelOptInfo *rel);
 
 #endif							/* PATHS_H */
-- 
2.21.0

v8-0006-Join-removal-at-run-time-with-UniqueKey.patchapplication/octet-stream; name=v8-0006-Join-removal-at-run-time-with-UniqueKey.patchDownload

From e1fb93b999f5d1336840f218563caaa1ae23a4dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 7 May 2020 08:36:29 +0800
Subject: [PATCH v8 6/7] Join removal at run-time with UniqueKey.

We delay the join removal at build_join_rel time to get
the benefits of UniqueKey to bypass some limitations of current
remove_useless_join.  However this new strategy can't be an
replacement of the current one since at the runtime it just knows
2 relation.  so it can't handle something like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id)
ON (a.b_id = b.id);

which can be handled by the current strategy.
---
 src/backend/optimizer/path/joinrels.c |   8 +-
 src/backend/optimizer/util/relnode.c  | 263 +++++++++++++++++++++++++-
 src/include/optimizer/pathnode.h      |   4 +-
 src/test/regress/expected/join.out    |  39 ++++
 src/test/regress/sql/join.sql         |  26 +++
 5 files changed, 333 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index c0d2332caf..1e6816d1b1 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -691,6 +691,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	SpecialJoinInfo sjinfo_data;
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
+	bool	innerrel_removed = false;
 
 	/* We should never try to join two overlapping sets of rels. */
 	Assert(!bms_overlap(rel1->relids, rel2->relids));
@@ -744,7 +745,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	 * goes with this particular joining.
 	 */
 	joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-							 &restrictlist);
+							 &restrictlist, &innerrel_removed);
 
 	/*
 	 * If we've already proven this join is empty, we needn't consider any
@@ -756,9 +757,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 		return joinrel;
 	}
 
+	if (!innerrel_removed)
 	/* Add paths to the join relation. */
-	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-								restrictlist);
+		populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+									restrictlist);
 
 	bms_free(joinrelids);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 591a0a3957..ac086930d6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -22,6 +22,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/inherit.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
@@ -74,6 +75,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
 
+static bool join_canbe_removed(PlannerInfo *root,
+							   SpecialJoinInfo *sjinfo,
+							   RelOptInfo *joinrel,
+							   RelOptInfo *innerrel,
+							   List *restrictlist);
 
 /*
  * setup_simple_rel_arrays
@@ -580,7 +586,8 @@ build_join_rel(PlannerInfo *root,
 			   RelOptInfo *outer_rel,
 			   RelOptInfo *inner_rel,
 			   SpecialJoinInfo *sjinfo,
-			   List **restrictlist_ptr)
+			   List **restrictlist_ptr,
+			   bool *innerrel_removed)
 {
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
@@ -719,6 +726,64 @@ build_join_rel(PlannerInfo *root,
 	 */
 	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
+	if (join_canbe_removed(root, sjinfo,
+						   joinrel, inner_rel,
+						   restrictlist))
+	{
+		ListCell *lc;
+
+		joinrel->rows = outer_rel->rows;
+		joinrel->consider_startup = outer_rel->consider_param_startup;
+		joinrel->consider_param_startup = outer_rel->consider_param_startup;
+		joinrel->consider_parallel = outer_rel->consider_parallel;
+
+		/* Rely on the projection path to reduce the tlist. */
+		joinrel->reltarget = outer_rel->reltarget;
+
+		joinrel->direct_lateral_relids = outer_rel->direct_lateral_relids;
+		joinrel->lateral_relids = outer_rel->lateral_relids;
+
+		joinrel->unique_for_rels = outer_rel->unique_for_rels;
+		joinrel->non_unique_for_rels = outer_rel->non_unique_for_rels;
+		joinrel->baserestrictinfo = outer_rel->baserestrictinfo;
+		joinrel->baserestrictcost = outer_rel->baserestrictcost;
+		joinrel->baserestrict_min_security = outer_rel->baserestrict_min_security;
+		joinrel->uniquekeys = outer_rel->uniquekeys;
+		joinrel->consider_partitionwise_join = outer_rel->consider_partitionwise_join;
+		joinrel->top_parent_relids = outer_rel->top_parent_relids;
+
+		/* Some scan path need to know which base relation to scan, it uses the relid
+		 * field, so we have to use the outerrel->relid.
+		 */
+		joinrel->relid = outer_rel->relid;
+
+		/* Almost the same paths as above, it assert the rte_kind is RTE_RELATION, so
+		 * we need to set as same as outerrel as well
+		 */
+		joinrel->rtekind = RTE_RELATION;
+
+		/* Make sure the path->parent point to current joinrel, can't update it in-place. */
+		foreach(lc, outer_rel->pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_path(joinrel, path);
+		}
+
+		foreach(lc, joinrel->partial_pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_partial_path(joinrel, path);
+		}
+		*innerrel_removed = true;
+	}
+	else
+	{
 	/* Store the partition information. */
 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
 								 sjinfo->jointype);
@@ -747,7 +812,7 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) restrictlist) &&
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
-
+	}
 	/* Add the joinrel to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
@@ -760,11 +825,18 @@ build_join_rel(PlannerInfo *root,
 	if (root->join_rel_level)
 	{
 		Assert(root->join_cur_level > 0);
-		Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+	   // Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
 		root->join_rel_level[root->join_cur_level] =
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	/* elog(INFO, "lev-%d Build JoinRel (%s) with %s and %s, inner is removed: %d", */
+	/*	 root->join_cur_level, */
+	/*	 bmsToString(joinrelids), */
+	/*	 bmsToString(outer_rel->relids), */
+	/*	 bmsToString(inner_rel->relids), */
+	/*	 joinrel->removed); */
+
 	return joinrel;
 }
 
@@ -2028,3 +2100,188 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+static bool
+join_canbe_removed(PlannerInfo *root,
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *joinrel,
+				   RelOptInfo *innerrel,
+				   List *restrictlist)
+{
+	Bitmapset	*vars;
+	List	*exprs = NIL;
+	ListCell	*lc;
+	Bitmapset	*tmp;
+	bool	res;
+
+	if (sjinfo->jointype != JOIN_LEFT)
+		return false;
+
+	if (innerrel->uniquekeys == NIL)
+		return false;
+
+	/*
+	 * Check if there is any innerrel's cols can't be removed.
+	 */
+
+	vars = pull_varnos((Node*)joinrel->reltarget->exprs);
+	tmp = bms_intersect(vars, innerrel->relids);
+	if (!bms_is_empty(tmp))
+		return false;
+
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		if (rinfo->can_join)
+		{
+			if (rinfo->mergeopfamilies != NIL)
+			{
+				if (bms_is_subset(rinfo->left_relids, innerrel->relids))
+					exprs = lappend(exprs, get_leftop(rinfo->clause));
+				else if (bms_is_subset(rinfo->right_relids, innerrel->relids))
+					exprs = lappend(exprs, get_rightop(rinfo->clause));
+				else
+					Assert(false);
+			}
+			else
+				/* Not mergeable join clause, we have to keep it */
+				return false;
+		}
+		else
+		{
+			/*
+			 * If the rinfo is not joinable clause, and it is not pushed down to
+			 * baserelation's basicrestrictinfo. so it must be in ON clauses.
+			 * Example: SELECT .. FROM t1 left join t2 on t1.a = 10;
+			 * In this case we can't remove the inner join as well.
+			 */
+			return false;
+		}
+	}
+	res =  relation_has_uniquekeys_for(root, innerrel, exprs, true);
+	return res;
+}
+
+
+size_t
+size_of_path(Path *path)
+{
+	switch(path->type)
+	{
+		case T_Path:
+			return sizeof(Path);
+		case T_IndexPath:
+			return sizeof(IndexPath);
+		case T_BitmapHeapPath:
+			return sizeof(BitmapHeapPath);
+		case T_TidPath:
+			return sizeof(TidPath);
+		case T_SubqueryScanPath:
+			return sizeof(SubqueryScanPath);
+		case T_ForeignPath:
+			return sizeof(ForeignPath);
+		case T_CustomPath:
+			return sizeof(CustomPath);
+
+
+		case T_NestPath:
+			return sizeof(NestPath);
+
+
+		case T_MergePath:
+			return sizeof(MergePath);
+
+
+		case T_HashPath:
+			return sizeof(HashPath);
+
+
+		case T_AppendPath:
+			return sizeof(AppendPath);
+
+
+		case T_MergeAppendPath:
+			return sizeof(MergeAppendPath);
+
+
+		case T_GroupResultPath:
+			return sizeof(GroupResultPath);
+
+
+		case T_MaterialPath:
+			return sizeof(MaterialPath);
+
+
+		case T_UniquePath:
+			return sizeof(UniquePath);
+
+
+		case T_GatherPath:
+			return sizeof(GatherPath);
+
+
+		case T_GatherMergePath:
+			return sizeof(GatherMergePath);
+
+
+		case T_ProjectionPath:
+			return sizeof(ProjectionPath);
+
+
+		case T_ProjectSetPath:
+			return sizeof(ProjectSetPath);
+
+
+		case T_SortPath:
+			return sizeof(SortPath);
+
+
+		case T_IncrementalSortPath:
+			return sizeof(IncrementalSortPath);
+
+
+		case T_GroupPath:
+			return sizeof(GroupPath);
+
+
+		case T_UpperUniquePath:
+			return sizeof(UpperUniquePath);
+
+
+		case T_AggPath:
+			return sizeof(AggPath);
+
+
+		case T_GroupingSetsPath:
+			return sizeof(GroupingSetsPath);
+
+
+		case T_MinMaxAggPath:
+			return sizeof(MinMaxAggPath);
+
+
+		case T_WindowAggPath:
+			return sizeof(WindowAggPath);
+
+
+		case T_SetOpPath:
+			return sizeof(SetOpPath);
+
+
+		case T_RecursiveUnionPath:
+			return sizeof(RecursiveUnionPath);
+
+
+		case T_LockRowsPath:
+			return sizeof(LockRowsPath);
+		case T_ModifyTablePath:
+			return sizeof(ModifyTablePath);
+		case T_LimitPath:
+			return sizeof(LimitPath);
+		default:
+			elog(ERROR, "unrecognized path type: %s",
+				 nodeToString(&path->type));
+			break;
+	}
+	return 0;
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..254961b2b4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -294,7 +294,8 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
 								  RelOptInfo *outer_rel,
 								  RelOptInfo *inner_rel,
 								  SpecialJoinInfo *sjinfo,
-								  List **restrictlist_ptr);
+								  List **restrictlist_ptr,
+								  bool *innerrel_removed);
 extern Relids min_join_parameterization(PlannerInfo *root,
 										Relids joinrelids,
 										RelOptInfo *outer_rel,
@@ -321,4 +322,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *parent_joinrel, List *restrictlist,
 										SpecialJoinInfo *sjinfo, JoinType jointype);
 
+extern size_t size_of_path(Path *path);
 #endif							/* PATHNODE_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8378936eda..b59c9a73ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6283,3 +6283,42 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+        QUERY PLAN        
+--------------------------
+ Seq Scan on public.m3 t1
+   Output: t1.a
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+drop table m1;
+drop table m2;
+drop table m3;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3312542411..317354547d 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2169,3 +2169,29 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+
+drop table m1;
+drop table m2;
+drop table m3;
-- 
2.21.0

v8-0007-Renamed-adjust_inherited_tlist-List-tlist-AppendR.patchapplication/octet-stream; name=v8-0007-Renamed-adjust_inherited_tlist-List-tlist-AppendR.patchDownload

From e9c561ff12aed2c3a475763ab772a59f3a38b593 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:57:54 +0800
Subject: [PATCH v8 7/7] Renamed adjust_inherited_tlist(List *tlist,
 AppendRelInfo *context) to

adjust_inherited_tlist(List *tlist, AppendRelInfo *appinfo) because the
`context` around that code is adjust_appendrel_attrs_context, The original
variable name looks improper.
---
 src/backend/optimizer/util/appendinfo.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 44c37ecffc..42dec15f0f 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -38,7 +38,7 @@ static void make_inh_translation_list(Relation oldrelation,
 static Node *adjust_appendrel_attrs_mutator(Node *node,
 											adjust_appendrel_attrs_context *context);
 static List *adjust_inherited_tlist(List *tlist,
-									AppendRelInfo *context);
+									AppendRelInfo *appinfo);
 
 
 /*
@@ -637,7 +637,7 @@ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
  * Note that this is not needed for INSERT because INSERT isn't inheritable.
  */
 static List *
-adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
+adjust_inherited_tlist(List *tlist, AppendRelInfo *appinfo)
 {
 	bool		changed_it = false;
 	ListCell   *tl;
@@ -646,7 +646,7 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 	int			attrno;
 
 	/* This should only happen for an inheritance case, not UNION ALL */
-	Assert(OidIsValid(context->parent_reloid));
+	Assert(OidIsValid(appinfo->parent_reloid));
 
 	/* Scan tlist and update resnos to match attnums of child rel */
 	foreach(tl, tlist)
@@ -659,13 +659,13 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 
 		/* Look up the translation of this column: it must be a Var */
 		if (tle->resno <= 0 ||
-			tle->resno > list_length(context->translated_vars))
+			tle->resno > list_length(appinfo->translated_vars))
 			elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-				 tle->resno, get_rel_name(context->parent_reloid));
-		childvar = (Var *) list_nth(context->translated_vars, tle->resno - 1);
+				 tle->resno, get_rel_name(appinfo->parent_reloid));
+		childvar = (Var *) list_nth(appinfo->translated_vars, tle->resno - 1);
 		if (childvar == NULL || !IsA(childvar, Var))
 			elog(ERROR, "attribute %d of relation \"%s\" does not exist",
-				 tle->resno, get_rel_name(context->parent_reloid));
+				 tle->resno, get_rel_name(appinfo->parent_reloid));
 
 		if (tle->resno != childvar->varattno)
 		{
@@ -719,7 +719,7 @@ adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 
 /*
  * find_appinfos_by_relids
- * 		Find AppendRelInfo structures for all relations specified by relids.
+ *		Find AppendRelInfo structures for all relations specified by relids.
  *
  * The AppendRelInfos are returned in an array, which can be pfree'd by the
  * caller. *nappinfos is set to the number of entries in the array.
-- 
2.21.0

#28

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

over 5 years ago

In reply to: Andy Fan (#26)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, May 8, 2020 at 7:27 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

+            else if (inner_is_onerow)
+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.
I don't understand this comment. Why is there only one row in the other
relation which can join to this row?
I guess you may miss the onerow special case if I understand correctly.
inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
innerrel can't be duplicated means: t1.y = t2.pk; so the finally result is onerow
as well. One of the overall query is SELECT .. FROM t1, t2 where t2.y = t2.pk;

I explained more about onerow in the v7 README.unqiuekey document, just copy
it here.

For some reason this mail remained in my drafts without being sent.
Sending it now. Sorry.

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns or create one
unique key with no column as you have done but handle it more
gracefully rather than spreading it all over the place.

Also, the amount of code that these patches changes seems to be much
larger than the feature's worth arguably. But it indicates that we are
modifying/adding more code than necessary. Some of that code can be
merged into existing code which does similar things as I have pointed
out in my previous comment.

Thanks for working on the expanded scope of the initial feature you
proposed. But it makes the feature more useful, I think.

--
Best Wishes,
Ashutosh Bapat

#29

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Ashutosh Bapat (#28)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

On Fri, May 8, 2020 at 7:27 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
+            else if (inner_is_onerow)
+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.
I don't understand this comment. Why is there only one row in the other
relation which can join to this row?
I guess you may miss the onerow special case if I understand correctly.
inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
innerrel can't be duplicated means: t1.y = t2.pk; so the finally
result is onerow

as well. One of the overall query is SELECT .. FROM t1, t2 where t2.y

= t2.pk;

I explained more about onerow in the v7 README.unqiuekey document, just

copy

it here.

For some reason this mail remained in my drafts without being sent.
Sending it now. Sorry.

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey, later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.

or create one
unique key with no column as you have done but handle it more
gracefully rather than spreading it all over the place.

I think this is what I do now, but it is possible that I spread it more than
necessary, if so, please let me know. I maintained the README.uniquekey
carefully since v7 and improved a lot in v8, it may be a good place to check
it.

Also, the amount of code that these patches changes seems to be much

larger than the feature's worth arguably. But it indicates that we are
modifying/adding more code than necessary. Some of that code can be
merged into existing code which does similar things as I have pointed
out in my previous comment.

I have reused the code select_mergejoin_clause rather than maintaining my
own copies in v8. Thanks a lot about that suggestion. This happened mainly
because I didn't read enough code. I will go through more to see if I have
similar
issues.

Thanks for working on the expanded scope of the initial feature you
proposed. But it makes the feature more useful, I think.

That's mainly because your suggestions are always insightful which makes me

willing to continue to work on it, so thank you all!

===
In fact, I was hesitated that how to reply an email when I send an new
version
of the patch. One idea is I should explain clear what is the difference
between Vn
and Vn-1. The other one is not many people read the Vn-1, so I would like
to keep
the email simplified and keep the README clearly to save the reviewer's
time.
Actually there are more changes in v8 than I stated above. for example:
for the
UniqueKey on baserelation, we will reduce the expr from the UniqueKey if
the
expr is a const. Unique on (A, B). query is SELECT b FROM t WHERE a =
1;
in v7, the UniqueKey is (a, b). In v8, the UniqueKey is (b) only. But
since most
people still not start to read it, so I add such information to README
rather than
echo the same in email thread. I will try more to understand how to
communicate more
smooth. But any suggestion on this part is appreciated.

Best Regards
Andy Fan

#30

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#26)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Ashutosh:

All your suggestions are followed except the UNION ALL one. I replied the
reason
below. For the suggestions about partitioned table, looks lot of cases to
handle, so
I summarize/enrich your idea in README and email thread, we can continue
to talk about that.

+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }
This should be only possible when it's not UNION ALL. We should add some
assert
or protection for that.
OK, actually I called this function in generate_union_paths. which handle
UNION case only. I will add the Assert anyway.

Finally I found I have to add one more parameter to
populate_unionrel_uniquekeys, and
the only usage of that parameter is used to Assert, so I didn't do that at
last.

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel,
innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses().

I didn't realized this before. I will refactor this code accordingly if
necessary
after reading that.

I reused select_mergejoin_clauses and removed the duplicated code in
uniquekeys.c
in v8.

At least from the

first reading of this patch, I get an impression that all these functions
which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information
useful to
the optimizer.

+
+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list,
false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel
*/
A joining relation need not be a base rel always, it could be a join rel
as
well.
good catch.

Fixed.

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */
Need a comment explaining this. Generally, I feel, this and other
functions in
this file need good comments explaining the logic esp. "why" instead of
"what".
Exactly.

Done in v8.

Will continue reviewing your new set of patches as time permits.

Thank you! Actually there is no big difference between v6 and v7
regarding the
UniqueKey part except 2 bug fix. However v7 has some more documents,
comments improvement and code refactor/split, which may be helpful
for review. You may try v7 next time if v8 has not come yet:)

v8 has come :)

Best Regards
Andy Fan

#31

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#29)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, 14 May 2020 at 03:48, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey, later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.

Having the "onerow" flag was not how I intended it to work.

Here's an example of how I thought it should work:

Assume t1 has UniqueKeys on {a}

SELECT DISTINCT a,b FROM t1;

Here the DISTINCT can be a no-op due to "a" being unique within t1. Or
more basically, {a} is a subset of {a,b}.

The code which does this is relation_has_uniquekeys_for(), which
contains the code:

+ if (list_is_subset(ukey->exprs, exprs))
+ return true;

In this case, ukey->exprs is {a} and exprs is {a,b}. So, if the
UniqueKey's exprs are a subset of, in this case, the DISTINCT exprs
then relation_has_uniquekeys_for() returns true. Basically
list_is_subset({a}, {a,b}), Answer: "Yes".

For the onerow stuff, if we can prove the relation returns only a
single row, e.g an aggregate without a GROUP BY, or there are
EquivalenceClasses with ec_has_const == true for each key of a unique
index, then why can't set just set the UniqueKeys to {}? That would
mean the code to determine if we can avoid performing an explicit
DISTINCT operation would be called with list_is_subset({}, {a,b}),
which is also true, in fact, an empty set is a subset of any set. Why
is there a need to special case that fact?

In light of those thoughts, can you explain why you think we need to
keep the onerow flag?

David

#32

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#31)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, May 14, 2020 at 6:20 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 14 May 2020 at 03:48, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <

ashutosh.bapat.oss@gmail.com> wrote:

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey, later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.

Having the "onerow" flag was not how I intended it to work.

Thanks for the detailed explanation. So I think we do need to handle

onerow
specially, (It means more things than adding each column as a UniqueKey).
but we don't need the onerow flag since we can tell it by ukey->exprs ==
NIL.

During the developer of this feature, I added some Asserts as double
checking
for onerow and exprs. it helps me to find some special cases. like
SELECT FROM multirows union SELECT FROM multirows; where targetlist is
NIL.
(I find the above case returns onerow as well just now). so onerow flag
allows us
check this special things with more attention. Even this is not the
original intention
but looks it is the one purpose now.

However I am feeling that removing onerow flag doesn't require much of code
changes. Almost all the special cases which are needed before are still
needed
after that and all the functions based on that like relation_is_onerow
/add_uniquekey_onerow is still valid, we just need change the
implementation.
so do you think we need to remove onerow flag totally?

Best Regards
Andy Fan

#33

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#29)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, May 13, 2020 at 11:48 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey, later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.

I just found I forget the links yesterday. Here is it.

[1]: /messages/by-id/CAApHDvqg+mQyxJnCizE=qJcBL90L=oFXTFyiwWWEaUnzG7Uc5Q@mail.gmail.com
/messages/by-id/CAApHDvqg+mQyxJnCizE=qJcBL90L=oFXTFyiwWWEaUnzG7Uc5Q@mail.gmail.com

[2]: /messages/by-id/CAKU4AWrGrs0Vk5OrZmS1gbTA2ijDH18NHKnXZTPZNuupn++ing@mail.gmail.com
/messages/by-id/CAKU4AWrGrs0Vk5OrZmS1gbTA2ijDH18NHKnXZTPZNuupn++ing@mail.gmail.com

#34

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: David Rowley (#15)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Tue, Apr 14, 2020 at 09:09:31PM +1200, David Rowley wrote:

The infrastructure (knowing the unique properties of a RelOptInfo), as
provided by the patch Andy has been working on, which is based on my
rough prototype version, I believe should be used for the skip scans
patch as well.

Hi,

Following our agreement about making skip scan patch to use UniqueKeys
implementation from this thread I've rebased index skip scan on first
two patches from v8 series [1]/messages/by-id/CAKU4AWpOM3_J-B=wQtCeU1TGr89MhpJBBkv2he1tAeQz6i4XNw@mail.gmail.com (if I understand correctly those two are
introducing the concept, and others are just using it). I would like to
clarify couple of things:

* It seems populate_baserel_uniquekeys, which actually sets uniquekeys,
is called after create_index_paths, where index skip scan already
needs to use them. Is it possible to call it earlier?

* Do I understand correctly, that a UniqueKey would be created in a
simplest case only when an index is unique? This makes it different
from what was implemented for index skip scan, since there UniqueKeys
also represents potential to use non-unique index to facilitate search
for unique values via skipping.

[1]: /messages/by-id/CAKU4AWpOM3_J-B=wQtCeU1TGr89MhpJBBkv2he1tAeQz6i4XNw@mail.gmail.com

#35

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#34)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, 22 May 2020 at 07:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

* It seems populate_baserel_uniquekeys, which actually sets uniquekeys,
is called after create_index_paths, where index skip scan already
needs to use them. Is it possible to call it earlier?

Seems reasonable. I originally put it after check_index_predicates().
We need to wait until at least then so we can properly build
UniqueKeys for partial indexes.

* Do I understand correctly, that a UniqueKey would be created in a
simplest case only when an index is unique? This makes it different
from what was implemented for index skip scan, since there UniqueKeys
also represents potential to use non-unique index to facilitate search
for unique values via skipping.

The way I picture the two working together is that the core UniqueKey
patch adds UniqueKeys to RelOptInfos to allow us to have knowledge
about what they're unique on based on the base relation's unique
indexes.

For Skipscans, that patch must expand on UniqueKeys to teach Paths
about them. I imagine we'll set some required UniqueKeys during
standard_qp_callback() and then we'll try to create some Skip Scan
paths (which are marked with UniqueKeys) if the base relation does not
already have UniqueKeys that satisfy the required UniqueKeys that were
set during standard_qp_callback(). In the future, there may be other
reasons to create Skip Scan paths for certain rels, e.g if they're on
the inner side of a SEMI/ANTI join, it might be useful to try those
when planning joins.

David

#36

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#32)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, 14 May 2020 at 14:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Thu, May 14, 2020 at 6:20 AM David Rowley <dgrowleyml@gmail.com> wrote:

Having the "onerow" flag was not how I intended it to work.

Thanks for the detailed explanation. So I think we do need to handle onerow
specially, (It means more things than adding each column as a UniqueKey).
but we don't need the onerow flag since we can tell it by ukey->exprs == NIL.

During the developer of this feature, I added some Asserts as double checking
for onerow and exprs. it helps me to find some special cases. like
SELECT FROM multirows union SELECT FROM multirows; where targetlist is NIL.
(I find the above case returns onerow as well just now). so onerow flag allows us
check this special things with more attention. Even this is not the original intention
but looks it is the one purpose now.

But surely that special case should just go in
populate_unionrel_uniquekeys(). If the targetlist is empty, then add a
UniqueKey with an empty set of exprs.

However I am feeling that removing onerow flag doesn't require much of code
changes. Almost all the special cases which are needed before are still needed
after that and all the functions based on that like relation_is_onerow
/add_uniquekey_onerow is still valid, we just need change the implementation.
so do you think we need to remove onerow flag totally?

Well, at the moment I'm not quite understanding why it's needed. If
it's not needed then we should remove it. If it turns out there is
some valid reason for it, then we should keep it.

David

#37

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#35)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, May 22, 2020 at 4:40 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 22 May 2020 at 07:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

* It seems populate_baserel_uniquekeys, which actually sets uniquekeys,
is called after create_index_paths, where index skip scan already
needs to use them. Is it possible to call it earlier?

Seems reasonable. I originally put it after check_index_predicates().
We need to wait until at least then so we can properly build
UniqueKeys for partial indexes.

Looks a very valid reason, I will add this in the next version.

* Do I understand correctly, that a UniqueKey would be created in a
simplest case only when an index is unique? This makes it different
from what was implemented for index skip scan, since there UniqueKeys
also represents potential to use non-unique index to facilitate search
for unique values via skipping.

The way I picture the two working together is that the core UniqueKey
patch adds UniqueKeys to RelOptInfos to allow us to have knowledge
about what they're unique on based on the base relation's unique
indexes.

For Skipscans, that patch must expand on UniqueKeys to teach Paths

about them. I imagine we'll set some required UniqueKeys during
standard_qp_callback() and then we'll try to create some Skip Scan
paths (which are marked with UniqueKeys) if the base relation does not
already have UniqueKeys that satisfy the required UniqueKeys that were
set during standard_qp_callback(). In the future, there may be other
reasons to create Skip Scan paths for certain rels, e.g if they're on
the inner side of a SEMI/ANTI join, it might be useful to try those
when planning joins.

Yes, In current implementation, we also add UniqueKey during
create_xxx_paths,
xxx may be grouping/union. after the index skipscan patch, we can do the
similar
things in create_indexskip_path.

--
Best Regards
Andy Fan

#38

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#34)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

if I understand correctly those two are introducing the concept, and

others are just using it

You are understand it correctly.

--
Best Regards
Andy Fan

#39

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#36)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, May 22, 2020 at 4:52 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 14 May 2020 at 14:39, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Thu, May 14, 2020 at 6:20 AM David Rowley <dgrowleyml@gmail.com>

wrote:

Having the "onerow" flag was not how I intended it to work.

Thanks for the detailed explanation. So I think we do need to handle

onerow

specially, (It means more things than adding each column as a UniqueKey).
but we don't need the onerow flag since we can tell it by ukey->exprs ==

NIL.

During the developer of this feature, I added some Asserts as double

checking

for onerow and exprs. it helps me to find some special cases. like
SELECT FROM multirows union SELECT FROM multirows; where targetlist is

NIL.

(I find the above case returns onerow as well just now). so onerow flag

allows us

check this special things with more attention. Even this is not the

original intention

but looks it is the one purpose now.

But surely that special case should just go in
populate_unionrel_uniquekeys(). If the targetlist is empty, then add a
UniqueKey with an empty set of exprs.

This is correct on this special case.

However I am feeling that removing onerow flag doesn't require much of
code

changes. Almost all the special cases which are needed before are still

needed

after that and all the functions based on that like relation_is_onerow
/add_uniquekey_onerow is still valid, we just need change the

implementation.

so do you think we need to remove onerow flag totally?

Well, at the moment I'm not quite understanding why it's needed. If
it's not needed then we should remove it. If it turns out there is
some valid reason for it, then we should keep it.

Currently I uses it to detect more special case which we can't image at
first, we can
understand it as it used to debug/Assert purpose only. After the mainly
code is
reviewed, that can be removed (based on the change is tiny).

--
Best Regards
Andy Fan

#40

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: David Rowley (#35)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:

The way I picture the two working together is that the core UniqueKey
patch adds UniqueKeys to RelOptInfos to allow us to have knowledge
about what they're unique on based on the base relation's unique
indexes.

Yep, I'm onboard with the idea.

For Skipscans, that patch must expand on UniqueKeys to teach Paths
about them.

That's already there.

I imagine we'll set some required UniqueKeys during
standard_qp_callback()

In standard_qp_callback, because pathkeys are computed at this point I
guess?

and then we'll try to create some Skip Scan
paths (which are marked with UniqueKeys) if the base relation does not
already have UniqueKeys that satisfy the required UniqueKeys that were
set during standard_qp_callback().

For a simple distinct query those UniqueKeys would be set based on
distinct clause. If I understand correctly, the very same is implemented
right now in create_distinct_paths, just after building all index paths,
so wouldn't it be just a duplication?

In general UniqueKeys in the skip scan patch were created from
distinctClause in build_index_paths (looks similar to what you've
described) and then based on them created index skip scan paths. So my
expectations were that the patch from this thread would work similar.

#41

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#40)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
I imagine we'll set some required UniqueKeys during
standard_qp_callback()

In standard_qp_callback, because pathkeys are computed at this point I
guess?

Yes. In particular, we set the pathkeys for DISTINCT clauses there.

and then we'll try to create some Skip Scan
paths (which are marked with UniqueKeys) if the base relation does not
already have UniqueKeys that satisfy the required UniqueKeys that were
set during standard_qp_callback().

For a simple distinct query those UniqueKeys would be set based on
distinct clause. If I understand correctly, the very same is implemented
right now in create_distinct_paths, just after building all index paths,
so wouldn't it be just a duplication?

I think we need to create the skip scan paths when we create the other
paths for base relations. We shouldn't be adjusting existing index
paths during create_distinct_paths(). The last code I saw for the
skip scans patch did something like if (IsA(path, IndexScanPath)) in
create_distinct_paths(), but that's only ever going to work when the
query is to a single relation. You'll never see IndexScanPaths in the
upper planner's paths when there are joins. You'd see join type paths
instead. It is possible to make use of skip scans for DISTINCT when
the query has joins. We'd just need to ensure the join does not
duplicate the unique rows from the skip scanned relation.

In general UniqueKeys in the skip scan patch were created from
distinctClause in build_index_paths (looks similar to what you've
described) and then based on them created index skip scan paths. So my
expectations were that the patch from this thread would work similar.

The difference will be that you'd be setting some distinct_uniquekeys
in standard_qp_callback() to explicitly request that some skip scan
paths be created for the uniquekeys, whereas the patch here just does
not bother doing DISTINCT if the upper relation already has unique
keys that state that the DISTINCT is not required. The skip scans
patch should check if the RelOptInfo for the uniquekeys set in
standard_qp_callback() are already mentioned in the RelOptInfo's
uniquekeys. If they are then there's no point in skip scanning as the
rel is already unique for the distinct_uniquekeys.

David

#42

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: David Rowley (#41)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:

For a simple distinct query those UniqueKeys would be set based on
distinct clause. If I understand correctly, the very same is implemented
right now in create_distinct_paths, just after building all index paths,
so wouldn't it be just a duplication?

I think we need to create the skip scan paths when we create the other
paths for base relations. We shouldn't be adjusting existing index
paths during create_distinct_paths(). The last code I saw for the
skip scans patch did something like if (IsA(path, IndexScanPath)) in
create_distinct_paths()

It's not the case since the late March.

In general UniqueKeys in the skip scan patch were created from
distinctClause in build_index_paths (looks similar to what you've
described) and then based on them created index skip scan paths. So my
expectations were that the patch from this thread would work similar.

The difference will be that you'd be setting some distinct_uniquekeys
in standard_qp_callback() to explicitly request that some skip scan
paths be created for the uniquekeys, whereas the patch here just does
not bother doing DISTINCT if the upper relation already has unique
keys that state that the DISTINCT is not required. The skip scans
patch should check if the RelOptInfo for the uniquekeys set in
standard_qp_callback() are already mentioned in the RelOptInfo's
uniquekeys. If they are then there's no point in skip scanning as the
rel is already unique for the distinct_uniquekeys.

It sounds like it makes semantics of UniqueKey a bit more confusing,
isn't it? At the moment it says:

Represents the unique properties held by a RelOptInfo.

With the proposed changes it would be "unique properties, that are held"
and "unique properties, that are requested", which are partially
duplicated, but stored in some different fields. From the skip scan
patch perspective it's probably doesn't make any difference, seems like
the implementation would be almost the same, just created UniqueKeys
would be of different type. But I'm afraid potentiall future users of
UniqueKeys could be easily confused.

#43

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#42)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Mon, 25 May 2020 at 19:14, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:
The difference will be that you'd be setting some distinct_uniquekeys
in standard_qp_callback() to explicitly request that some skip scan
paths be created for the uniquekeys, whereas the patch here just does
not bother doing DISTINCT if the upper relation already has unique
keys that state that the DISTINCT is not required. The skip scans
patch should check if the RelOptInfo for the uniquekeys set in
standard_qp_callback() are already mentioned in the RelOptInfo's
uniquekeys. If they are then there's no point in skip scanning as the
rel is already unique for the distinct_uniquekeys.

It sounds like it makes semantics of UniqueKey a bit more confusing,
isn't it? At the moment it says:

Represents the unique properties held by a RelOptInfo.

With the proposed changes it would be "unique properties, that are held"
and "unique properties, that are requested", which are partially
duplicated, but stored in some different fields. From the skip scan
patch perspective it's probably doesn't make any difference, seems like
the implementation would be almost the same, just created UniqueKeys
would be of different type. But I'm afraid potentiall future users of
UniqueKeys could be easily confused.

If there's some comment that says UniqueKeys are for RelOptInfos, then
perhaps that comment just needs to be expanded to mention the Path
uniqueness when we add the uniquekeys field to Path.

I think the main point of basing skip scans on top of this uniquekeys
patch is to ensure it's the right thing for the job. I don't think
it's realistic to be maintaining two different sets of infrastructure
which serve a very similar purpose. It's important we make UniqueKeys
general purpose enough to support future useful forms of optimisation.
Basing skip scans on it seems like a good exercise towards that. I'm
not expecting that we need to make zero changes here to allow it to
work well with skip scans.

David

#44

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#41)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Mon, May 25, 2020 at 2:34 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
I imagine we'll set some required UniqueKeys during
standard_qp_callback()

In standard_qp_callback, because pathkeys are computed at this point I
guess?

Yes. In particular, we set the pathkeys for DISTINCT clauses there.

Actually I have some issues to understand from here, then try to read index
skip scan patch to fully understand what is the requirement, but that
doesn't
get it so far[1]/messages/by-id/CAKU4AWq=wWkAo-CDOQ5Ea6UwYvZCgb501w6iqU0rtnTT-zg6bQ@mail.gmail.com. So what is the "UniqueKeys" in "UniqueKeys during
standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed
at this point” means? I tried to think it as root->distinct_pathkeys,
however I
didn't fully understand where root->distinct_pathkeys is used for as well.

[1]: /messages/by-id/CAKU4AWq=wWkAo-CDOQ5Ea6UwYvZCgb501w6iqU0rtnTT-zg6bQ@mail.gmail.com
/messages/by-id/CAKU4AWq=wWkAo-CDOQ5Ea6UwYvZCgb501w6iqU0rtnTT-zg6bQ@mail.gmail.com

--
Best Regards
Andy Fan

#45

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Andy Fan (#44)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, 5 Jun 2020 at 14:36, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, May 25, 2020 at 2:34 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
I imagine we'll set some required UniqueKeys during
standard_qp_callback()

In standard_qp_callback, because pathkeys are computed at this point I
guess?

Yes. In particular, we set the pathkeys for DISTINCT clauses there.

Actually I have some issues to understand from here, then try to read index
skip scan patch to fully understand what is the requirement, but that doesn't
get it so far[1]. So what is the "UniqueKeys" in "UniqueKeys during
standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed
at this point” means? I tried to think it as root->distinct_pathkeys, however I
didn't fully understand where root->distinct_pathkeys is used for as well.

In standard_qp_callback(), what we'll do with uniquekeys is pretty
much what we already do with pathkeys there. Basically pathkeys are
set there to have the planner attempt to produce a plan that satisfies
those pathkeys. Notice at the end of standard_qp_callback() we set
the pathkeys according to the first upper planner operation that'll
need to make use of those pathkeys. e.g, If there's a GROUP BY and a
DISTINCT in the query, then use the pathkeys for GROUP BY, since that
must occur before DISTINCT. Likely uniquekeys will want to follow the
same rules there for the operations that can make use of paths with
uniquekeys, which in this case, I believe, will be the same as the
example I just mentioned for pathkeys, except we'll only be able to
support GROUP BY without any aggregate functions.

David

Show quoted text

[1] /messages/by-id/CAKU4AWq=wWkAo-CDOQ5Ea6UwYvZCgb501w6iqU0rtnTT-zg6bQ@mail.gmail.com

#46

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: David Rowley (#45)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, Jun 5, 2020 at 10:57 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 5 Jun 2020 at 14:36, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, May 25, 2020 at 2:34 AM David Rowley <dgrowleyml@gmail.com>

wrote:

On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <9erthalion6@gmail.com>

wrote:

On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
I imagine we'll set some required UniqueKeys during
standard_qp_callback()

In standard_qp_callback, because pathkeys are computed at this point I
guess?

Yes. In particular, we set the pathkeys for DISTINCT clauses there.

Actually I have some issues to understand from here, then try to read

index

skip scan patch to fully understand what is the requirement, but that

doesn't

get it so far[1]. So what is the "UniqueKeys" in "UniqueKeys during
standard_qp_callback()" and what is the "pathkeys" in "pathkeys are

computed

at this point” means? I tried to think it as root->distinct_pathkeys,

however I

didn't fully understand where root->distinct_pathkeys is used for as

well.

In standard_qp_callback(), what we'll do with uniquekeys is pretty
much what we already do with pathkeys there. Basically pathkeys are
set there to have the planner attempt to produce a plan that satisfies
those pathkeys. Notice at the end of standard_qp_callback() we set

the pathkeys according to the first upper planner operation that'll

need to make use of those pathkeys. e.g, If there's a GROUP BY and a
DISTINCT in the query, then use the pathkeys for GROUP BY, since that
must occur before DISTINCT.

Thanks for your explanation. Looks I understand now based on your comments.
Take root->group_pathkeys for example, the similar information also
available in
root->parse->groupClauses but we do use of root->group_pathkeys with
pathkeys_count_contained_in function in many places, that is mainly because
the content between between the 2 is different some times, like the case in
pathkey_is_redundant.

Likely uniquekeys will want to follow the

same rules there for the operations that can make use of paths with
uniquekeys, which in this case, I believe, will be the same as the
example I just mentioned for pathkeys, except we'll only be able to
support GROUP BY without any aggregate functions.

All the places I want to use UniqueKey so far (like distinct, group by and
others)
have an input_relation (RelOptInfo), and the UniqueKey information can be
get
there. at the same time, all the pathkey in PlannerInfo is used for Upper
planner
but UniqueKey may be used in current planner some time, like
reduce_semianti_joins/
remove_useless_join, I am not sure if we must maintain uniquekey in
PlannerInfo.

--
Best Regards
Andy Fan

#47

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: David Rowley (#43)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Fri, Jun 05, 2020 at 12:26:15PM +1200, David Rowley wrote:
On Mon, 25 May 2020 at 19:14, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:
The difference will be that you'd be setting some distinct_uniquekeys
in standard_qp_callback() to explicitly request that some skip scan
paths be created for the uniquekeys, whereas the patch here just does
not bother doing DISTINCT if the upper relation already has unique
keys that state that the DISTINCT is not required. The skip scans
patch should check if the RelOptInfo for the uniquekeys set in
standard_qp_callback() are already mentioned in the RelOptInfo's
uniquekeys. If they are then there's no point in skip scanning as the
rel is already unique for the distinct_uniquekeys.

It sounds like it makes semantics of UniqueKey a bit more confusing,
isn't it? At the moment it says:

Represents the unique properties held by a RelOptInfo.

With the proposed changes it would be "unique properties, that are held"
and "unique properties, that are requested", which are partially
duplicated, but stored in some different fields. From the skip scan
patch perspective it's probably doesn't make any difference, seems like
the implementation would be almost the same, just created UniqueKeys
would be of different type. But I'm afraid potentiall future users of
UniqueKeys could be easily confused.

If there's some comment that says UniqueKeys are for RelOptInfos, then
perhaps that comment just needs to be expanded to mention the Path
uniqueness when we add the uniquekeys field to Path.

My concerns are more about having two different sets of distinct
uniquekeys:

* one prepared in standard_qp_callback for skip scan (I guess those
should be added to PlannerInfo?)

* one in create_distinct_paths as per current implementation

with what seems to be similar content.

I think the main point of basing skip scans on top of this uniquekeys
patch is to ensure it's the right thing for the job. I don't think
it's realistic to be maintaining two different sets of infrastructure
which serve a very similar purpose. It's important we make UniqueKeys
general purpose enough to support future useful forms of optimisation.
Basing skip scans on it seems like a good exercise towards that. I'm
not expecting that we need to make zero changes here to allow it to
work well with skip scans.

Sure, no one suggests to have two ways of saying "this thing is unique".
I'm just trying to figure out how to make skip scan and uniquekeys play
together without having rough edges.

#48

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#47)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sat, 6 Jun 2020 at 21:15, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

My concerns are more about having two different sets of distinct
uniquekeys:

* one prepared in standard_qp_callback for skip scan (I guess those
should be added to PlannerInfo?)

Yes. Those must be set so that we know if and what we should try to
create Skip Scan Index paths for. Just like we'll create index paths
for PlannerInfo.query_pathkeys.

* one in create_distinct_paths as per current implementation

with what seems to be similar content.

I think we need to have UniqueKeys in RelOptInfo so we can describe
what a relation is unique by. There's no point for example in
creating skip scan paths for a relation that's already unique on
whatever we might try to skip scan on. e.g someone does:

SELECT DISTINCT unique_and_indexed_column FROM tab;

Since there's a unique index on unique_and_indexed_column then we
needn't try to create a skipscan path for it.

However, the advantages of having UniqueKeys on the RelOptInfo goes a
little deeper than that. We can make use of it anywhere where we
currently do relation_has_unique_index_for() for. Plus we get what
Andy wants and can skip useless DISTINCT operations when the result is
already unique on the distinct clause. Sure we could carry all the
relation's unique properties around in Paths, but that's not the right
place. It's logically a property of the relation, not the path
specifically. RelOptInfo is a good place to store the properties of
relations.

The idea of the meaning of uniquekeys within a path is that the path
is specifically making those keys unique. We're not duplicating the
RelOptInfo's uniquekeys there.

If we have a table like:

CREATE TABLE tab (
a INT PRIMARY KEY,
b INT NOT NULL
);

CREATE INDEX tab_b_idx ON tab (b);

Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
index scan paths uniquekey properties set to NULL, but the skipscan
index path uniquekeys for tab_b_idx set to {b}. Then when we go
create the distinct paths Andy's work will see that there's no
RelOptInfo uniquekeys for the distinct clause, but the skip scan work
will loop over the unique_pathlist and find that we have a skipscan
path with the required uniquekeys, a.k.a {b}.

Does that make sense?

David

#49

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: David Rowley (#48)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sun, Jun 07, 2020 at 06:51:22PM +1200, David Rowley wrote:

* one in create_distinct_paths as per current implementation

with what seems to be similar content.

I think we need to have UniqueKeys in RelOptInfo so we can describe
what a relation is unique by. There's no point for example in
creating skip scan paths for a relation that's already unique on
whatever we might try to skip scan on. e.g someone does:

SELECT DISTINCT unique_and_indexed_column FROM tab;

Since there's a unique index on unique_and_indexed_column then we
needn't try to create a skipscan path for it.

However, the advantages of having UniqueKeys on the RelOptInfo goes a
little deeper than that. We can make use of it anywhere where we
currently do relation_has_unique_index_for() for. Plus we get what
Andy wants and can skip useless DISTINCT operations when the result is
already unique on the distinct clause. Sure we could carry all the
relation's unique properties around in Paths, but that's not the right
place. It's logically a property of the relation, not the path
specifically. RelOptInfo is a good place to store the properties of
relations.

The idea of the meaning of uniquekeys within a path is that the path
is specifically making those keys unique. We're not duplicating the
RelOptInfo's uniquekeys there.

If we have a table like:

CREATE TABLE tab (
a INT PRIMARY KEY,
b INT NOT NULL
);

CREATE INDEX tab_b_idx ON tab (b);

Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
index scan paths uniquekey properties set to NULL, but the skipscan
index path uniquekeys for tab_b_idx set to {b}. Then when we go
create the distinct paths Andy's work will see that there's no
RelOptInfo uniquekeys for the distinct clause, but the skip scan work
will loop over the unique_pathlist and find that we have a skipscan
path with the required uniquekeys, a.k.a {b}.

Does that make sense?

Yes, from this point of view it makes sense. I've already posted the
first version of index skip scan based on this implementation [1]/messages/by-id/20200609102247.jdlatmfyeecg52fi@localhost. There
could be rought edges, but overall I hope we're on the same page.

[1]: /messages/by-id/20200609102247.jdlatmfyeecg52fi@localhost

#50

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#49)

6 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

I just did another self-review about this patch and took some suggestions
based
on the discussion above. The attached is the v9 version. When you check
the
uniquekey patch, README.uniquekey should be a good place to start with.

Main changes in v9 includes:

1. called populate_baserel_uniquekeys after check_index_predicates.
2. removed the UniqueKey->onerow flag since we can tell it by exprs == NIL.
3. expression index code improvement.
4. code & comments refactoring.

As for the Index Skip Scan, I still have not merged the changes in the
Index
Skip Scan patch[1]/messages/by-id/20200609102247.jdlatmfyeecg52fi@localhost -- Best Regards Andy Fan. We may need some addition for that, but probably not
need to modify the existing code. After we can finalize it, we can add it
in
that patch. I will keep a close eye on it as well.

[1]: /messages/by-id/20200609102247.jdlatmfyeecg52fi@localhost -- Best Regards Andy Fan
/messages/by-id/20200609102247.jdlatmfyeecg52fi@localhost
--
Best Regards
Andy Fan

Attachments:

v9-0005-Treat-the-input-as-sorted-if-the-group-by-clause-.patchapplication/octet-stream; name=v9-0005-Treat-the-input-as-sorted-if-the-group-by-clause-.patchDownload

From 9c2c1c6310150e9ac12f16259e16a07616a1f479 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 16:32:28 +0800
Subject: [PATCH v9 5/6] Treat the input as sorted if the group by clause is
 unique already.

In the previous commit, if no aggregation call, the group node will
be removed totally, so this patch is just for the cases where aggregation
call exists
---
 src/backend/commands/explain.c           |   4 +
 src/backend/executor/nodeAgg.c           |   6 ++
 src/backend/optimizer/plan/createplan.c  |   3 +-
 src/backend/optimizer/plan/planner.c     | 127 +++++++++++++++--------
 src/include/nodes/nodes.h                |   3 +-
 src/include/nodes/pathnodes.h            |   1 +
 src/include/nodes/plannodes.h            |   1 +
 src/test/regress/expected/aggregates.out |  42 ++++++++
 src/test/regress/sql/aggregates.sql      |  17 +++
 9 files changed, 159 insertions(+), 45 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a131d15ac0..e122809454 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1930,6 +1930,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
+			if (es->format != EXPLAIN_FORMAT_TEXT ||
+				(es->verbose && ((Agg *) plan)->input_unique))
+				ExplainPropertyBool("Input Unique",
+									((Agg *) plan)->input_unique, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index a20554ae65..6ecef58b88 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2139,6 +2139,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_UNIQUE:
+				/* AGG_UNIQUE is translated to AGG_SORTED, Handle it here
+				 * to make compiler quiet.
+				 */
+				Assert(false);
+				break;
 		}
 
 		if (!TupIsNull(result))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index eb9543f6ad..e6fe2d249b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6373,7 +6373,7 @@ make_agg(List *tlist, List *qual,
 	/* Reduce to long, but 'ware overflow! */
 	numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 
-	node->aggstrategy = aggstrategy;
+	node->aggstrategy = aggstrategy == AGG_UNIQUE ? AGG_SORTED : aggstrategy;
 	node->aggsplit = aggsplit;
 	node->numCols = numGroupCols;
 	node->grpColIdx = grpColIdx;
@@ -6384,6 +6384,7 @@ make_agg(List *tlist, List *qual,
 	node->aggParams = NULL;		/* SS_finalize_plan() will fill this */
 	node->groupingSets = groupingSets;
 	node->chain = chain;
+	node->input_unique = aggstrategy == AGG_UNIQUE;
 
 	plan->qual = qual;
 	plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 999c1250c4..9d9cbdd569 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3852,51 +3852,61 @@ create_grouping_paths(PlannerInfo *root,
 		int			flags = 0;
 		GroupPathExtraData extra;
 
-		/*
-		 * Determine whether it's possible to perform sort-based
-		 * implementations of grouping.  (Note that if groupClause is empty,
-		 * grouping_is_sortable() is trivially true, and all the
-		 * pathkeys_contained_in() tests will succeed too, so that we'll
-		 * consider every surviving input path.)
-		 *
-		 * If we have grouping sets, we might be able to sort some but not all
-		 * of them; in this case, we need can_sort to be true as long as we
-		 * must consider any sorted-input plan.
-		 */
-		if ((gd && gd->rollups != NIL)
-			|| grouping_is_sortable(parse->groupClause))
-			flags |= GROUPING_CAN_USE_SORT;
+		if (group_unique_input)
+		{
+			/* In this case we don't need to set other flags */
+			Assert(parse->groupClause != NIL);
+			Assert(gd == NULL);
+			flags |= GROUPING_INPUT_UNIQUE;
+		}
+		else
+		{
+			/*
+			 * Determine whether it's possible to perform sort-based
+			 * implementations of grouping.  (Note that if groupClause is empty,
+			 * grouping_is_sortable() is trivially true, and all the
+			 * pathkeys_contained_in() tests will succeed too, so that we'll
+			 * consider every surviving input path.)
+			 *
+			 * If we have grouping sets, we might be able to sort some but not all
+			 * of them; in this case, we need can_sort to be true as long as we
+			 * must consider any sorted-input plan.
+			 */
+			if ((gd && gd->rollups != NIL)
+				|| grouping_is_sortable(parse->groupClause))
+				flags |= GROUPING_CAN_USE_SORT;
 
-		/*
-		 * Determine whether we should consider hash-based implementations of
-		 * grouping.
-		 *
-		 * Hashed aggregation only applies if we're grouping. If we have
-		 * grouping sets, some groups might be hashable but others not; in
-		 * this case we set can_hash true as long as there is nothing globally
-		 * preventing us from hashing (and we should therefore consider plans
-		 * with hashes).
-		 *
-		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
-		 * BY aggregates.  (Doing so would imply storing *all* the input
-		 * values in the hash table, and/or running many sorts in parallel,
-		 * either of which seems like a certain loser.)  We similarly don't
-		 * support ordered-set aggregates in hashed aggregation, but that case
-		 * is also included in the numOrderedAggs count.
-		 *
-		 * Note: grouping_is_hashable() is much more expensive to check than
-		 * the other gating conditions, so we want to do it last.
-		 */
-		if ((parse->groupClause != NIL &&
-			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
-			flags |= GROUPING_CAN_USE_HASH;
+			/*
+			 * Determine whether we should consider hash-based implementations of
+			 * grouping.
+			 *
+			 * Hashed aggregation only applies if we're grouping. If we have
+			 * grouping sets, some groups might be hashable but others not; in
+			 * this case we set can_hash true as long as there is nothing globally
+			 * preventing us from hashing (and we should therefore consider plans
+			 * with hashes).
+			 *
+			 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+			 * BY aggregates.  (Doing so would imply storing *all* the input
+			 * values in the hash table, and/or running many sorts in parallel,
+			 * either of which seems like a certain loser.)  We similarly don't
+			 * support ordered-set aggregates in hashed aggregation, but that case
+			 * is also included in the numOrderedAggs count.
+			 *
+			 * Note: grouping_is_hashable() is much more expensive to check than
+			 * the other gating conditions, so we want to do it last.
+			 */
+			if ((parse->groupClause != NIL &&
+				 agg_costs->numOrderedAggs == 0 &&
+				 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+				flags |= GROUPING_CAN_USE_HASH;
 
-		/*
-		 * Determine whether partial aggregation is possible.
-		 */
-		if (can_partial_agg(root, agg_costs))
-			flags |= GROUPING_CAN_PARTIAL_AGG;
+			/*
+			 * Determine whether partial aggregation is possible.
+			 */
+			if (can_partial_agg(root, agg_costs))
+				flags |= GROUPING_CAN_PARTIAL_AGG;
+		}
 
 		extra.flags = flags;
 		extra.target_parallel_safe = target_parallel_safe;
@@ -6522,9 +6532,40 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		group_input_unique = (extra->flags & GROUPING_INPUT_UNIQUE) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
+	if (group_input_unique)
+	{
+		Path *path = input_rel->cheapest_total_path;
+		add_path(grouped_rel, (Path *) create_agg_path(root,
+													   grouped_rel,
+													   path,
+													   grouped_rel->reltarget,
+													   AGG_UNIQUE,
+													   AGGSPLIT_SIMPLE,
+													   parse->groupClause,
+													   havingQual,
+													   agg_costs,
+													   dNumGroups));
+
+		if (path != input_rel->cheapest_startup_path)
+		{
+			path = input_rel->cheapest_startup_path;
+			add_path(grouped_rel, (Path *) create_agg_path(root,
+														   grouped_rel,
+														   path,
+														   grouped_rel->reltarget,
+														   AGG_UNIQUE,
+														   AGGSPLIT_SIMPLE,
+														   parse->groupClause,
+														   havingQual,
+														   agg_costs,
+														   dNumGroups));
+		}
+		return;
+	}
 	if (can_sort)
 	{
 		/*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 41110ed888..010266ed4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -761,7 +761,8 @@ typedef enum AggStrategy
 	AGG_PLAIN,					/* simple agg across all input rows */
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
-	AGG_MIXED					/* grouped agg, hash and sort both used */
+	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_UNIQUE					/* grouped agg, the group clause is unique */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 02e4458bef..b4bb4e6267 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2472,6 +2472,7 @@ typedef struct JoinPathExtraData
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_INPUT_UNIQUE		0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..be31020a40 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -828,6 +828,7 @@ typedef struct Agg
 	/* Note: planner provides numGroups & aggParams only in HASHED/MIXED case */
 	List	   *groupingSets;	/* grouping sets to use */
 	List	   *chain;			/* chained Agg/Sort nodes */
+	bool		input_unique;   /* The input is unique already */
 } Agg;
 
 /* ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 87fef9f417..582b55b1cf 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2699,6 +2699,46 @@ set work_mem to default;
 ----+----+----
 (0 rows)
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+              QUERY PLAN               
+---------------------------------------
+ GroupAggregate
+   Output: pk, sum(b)
+   Group Key: agg_unique_1.pk
+   Input Unique: true
+   ->  Seq Scan on public.agg_unique_1
+         Output: pk, b
+(6 rows)
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+                    QUERY PLAN                     
+---------------------------------------------------
+ GroupAggregate
+   Output: agg_unique_2.unsortable_col, count(*)
+   Group Key: agg_unique_2.unsortable_col
+   Input Unique: true
+   ->  HashAggregate
+         Output: agg_unique_2.unsortable_col
+         Group Key: agg_unique_2.unsortable_col
+         ->  Seq Scan on public.agg_unique_2
+               Output: agg_unique_2.unsortable_col
+(9 rows)
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+ unsortable_col | count 
+----------------+-------
+              2 |     1
+              1 |     1
+(2 rows)
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -2707,3 +2747,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ad025206fb..439eec18ac 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1202,6 +1202,21 @@ set work_mem to default;
   union all
 (select * from agg_group_4 except select * from agg_hash_4);
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -1210,3 +1225,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
-- 
2.21.0

v9-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchapplication/octet-stream; name=v9-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchDownload

From 6f1ce0578c407767a40176456c36db51b2c7a252 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 3 May 2020 22:37:46 +0800
Subject: [PATCH v9 1/6] Introduce RelOptInfo->notnullattrs attribute

The notnullattrs is calculated from catalog and run-time query. That
infomation is translated to child relation as well for partitioned
table.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d984da25d7..f43cd60cbd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1002,6 +1002,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1058,6 +1059,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..95b1b14cd3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 25545029d7..0b2f9d398a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..9e3ebd488a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v9-0003-Refactoring-existing-uniqueness-related-code-to-u.patchapplication/octet-stream; name=v9-0003-Refactoring-existing-uniqueness-related-code-to-u.patchDownload

From f987d2f34f566993be8e2e8a394732585f9fe076 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 15:44:15 +0800
Subject: [PATCH v9 3/6] Refactoring existing uniqueness related code to use
 UniqueKey

The call of remove_useless_joins and reduce_unique_semijoins are
postponed due to this as well
---
 .../postgres_fdw/expected/postgres_fdw.out    |  32 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   1 +
 src/backend/optimizer/path/allpaths.c         |  18 ++-
 src/backend/optimizer/plan/analyzejoins.c     | 137 ++++--------------
 src/backend/optimizer/plan/planmain.c         |  13 --
 src/test/regress/expected/join.out            |  59 ++++----
 src/test/regress/sql/join.sql                 |  16 +-
 7 files changed, 97 insertions(+), 179 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 82fc1290ef..a9503488ff 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -2902,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 509699f973..49b50fdda5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..d80bff65d2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -584,39 +586,7 @@ reduce_unique_semijoins(PlannerInfo *root)
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +610,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +892,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..8378936eda 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4411,40 +4411,37 @@ select d.* from d left join (select distinct * from b) s
 (1 row)
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..3312542411 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1512,17 +1512,15 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id and d.b = s.c_id;
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.21.0

v9-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patchapplication/octet-stream; name=v9-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patchDownload

From 8526dc471c0d194b6e08db0fdf309af5290fe9b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:54:49 +0800
Subject: [PATCH v9 4/6] Remove distinct node & AggNode if the input is unique
 already.

---
 src/backend/optimizer/plan/planner.c          |  30 +-
 src/test/regress/expected/aggregates.out      |  73 ++--
 src/test/regress/expected/select_distinct.out | 397 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql           |   6 +-
 src/test/regress/sql/select_distinct.sql      | 146 +++++++
 5 files changed, 602 insertions(+), 50 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 33d188b17f..999c1250c4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3812,7 +3812,27 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
+	bool	group_unique_input = false;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggregation functions.
+		 */
+		group_unique_input = relation_has_uniquekeys_for(root, input_rel,
+														 groupExprs, false);
+		if (group_unique_input &&
+			groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL)
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -4738,6 +4758,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs, false))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4775,10 +4801,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3bd184ae29..87fef9f417 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -973,14 +973,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1139,7 +1137,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1162,10 +1160,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1195,12 +1190,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1212,29 +1205,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
@@ -1264,12 +1255,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1282,14 +1271,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..da204734e2 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,400 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP TABLE dist_p;
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p2(d);
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+ANALYZE p;
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+           QUERY PLAN           
+--------------------------------
+ HashAggregate
+   Group Key: p.d
+   ->  Append
+         ->  Seq Scan on p1 p_1
+         ->  Seq Scan on p2 p_2
+(5 rows)
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Append
+   ->  Bitmap Heap Scan on p1 p_1
+         Recheck Cond: (c = 1)
+         ->  Bitmap Index Scan on p1_a_c_idx
+               Index Cond: (c = 1)
+   ->  Bitmap Heap Scan on p2 p_2
+         Recheck Cond: (c = 1)
+         ->  Bitmap Index Scan on p2_a_c_idx
+               Index Cond: (c = 1)
+(9 rows)
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+         QUERY PLAN         
+----------------------------
+ Seq Scan on x
+   Filter: (x.* = '(1)'::x)
+(2 rows)
+
+DROP TABLE x;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 044d515507..ad025206fb 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -413,13 +413,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..14bf36d908 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,149 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p2(d);
+
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+
+ANALYZE p;
+
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+DROP TABLE x;
-- 
2.21.0

v9-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-stru.patchapplication/octet-stream; name=v9-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-stru.patchDownload

From d755f4e4d1a5ed97298f09ad776c441dcfb50ce8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:50:52 +0800
Subject: [PATCH v9 2/6] Introduce UniqueKey attributes on RelOptInfo struct.

UniqueKey is a set of exprs on RelOptInfo which represents the exprs
will be unique on the given RelOptInfo. You can see README.uniquekey
for more information.
---
 src/backend/nodes/copyfuncs.c               |   13 +
 src/backend/nodes/list.c                    |   31 +
 src/backend/nodes/makefuncs.c               |   13 +
 src/backend/nodes/outfuncs.c                |   11 +
 src/backend/nodes/readfuncs.c               |   10 +
 src/backend/optimizer/path/Makefile         |    3 +-
 src/backend/optimizer/path/README.uniquekey |  131 +++
 src/backend/optimizer/path/allpaths.c       |   10 +
 src/backend/optimizer/path/joinpath.c       |    9 +-
 src/backend/optimizer/path/joinrels.c       |    2 +
 src/backend/optimizer/path/pathkeys.c       |    3 +-
 src/backend/optimizer/path/uniquekeys.c     | 1131 +++++++++++++++++++
 src/backend/optimizer/plan/planner.c        |   13 +-
 src/backend/optimizer/prep/prepunion.c      |    2 +
 src/backend/optimizer/util/appendinfo.c     |   44 +
 src/backend/optimizer/util/inherit.c        |   16 +-
 src/include/nodes/makefuncs.h               |    3 +
 src/include/nodes/nodes.h                   |    1 +
 src/include/nodes/pathnodes.h               |   29 +-
 src/include/nodes/pg_list.h                 |    2 +
 src/include/optimizer/appendinfo.h          |    3 +
 src/include/optimizer/optimizer.h           |    2 +
 src/include/optimizer/paths.h               |   43 +
 23 files changed, 1502 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/optimizer/path/README.uniquekey
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d8cf87e6d0..efe7c3ae10 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2273,6 +2273,16 @@ _copyPathKey(const PathKey *from)
 	return newnode;
 }
 
+static UniqueKey *
+_copyUniqueKey(const UniqueKey *from)
+{
+	UniqueKey	*newnode = makeNode(UniqueKey);
+
+	COPY_NODE_FIELD(exprs);
+	COPY_SCALAR_FIELD(multi_nullvals);
+
+	return newnode;
+}
 /*
  * _copyRestrictInfo
  */
@@ -5152,6 +5162,9 @@ copyObjectImpl(const void *from)
 		case T_PathKey:
 			retval = _copyPathKey(from);
 			break;
+		case T_UniqueKey:
+			retval = _copyUniqueKey(from);
+			break;
 		case T_RestrictInfo:
 			retval = _copyRestrictInfo(from);
 			break;
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..a7a99b70f2 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * return true iff every entry in "members" list is also present
+ * in the "target" list.
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index b442b5a29e..7af9c7346d 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -812,3 +812,16 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, bool multi_nullvals)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	ukey->exprs = exprs;
+	ukey->multi_nullvals = multi_nullvals;
+	return ukey;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..c3a9632992 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2428,6 +2428,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+	WRITE_NODE_FIELD(exprs);
+	WRITE_BOOL_FIELD(multi_nullvals);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4127,6 +4135,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..3a18571d0c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -452,6 +452,14 @@ _readSetOperationStmt(void)
 	READ_DONE();
 }
 
+static UniqueKey *
+_readUniqueKey(void)
+{
+	READ_LOCALS(UniqueKey);
+	READ_NODE_FIELD(exprs);
+	READ_BOOL_FIELD(multi_nullvals);
+	READ_DONE();
+}
 
 /*
  *	Stuff from primnodes.h.
@@ -2656,6 +2664,8 @@ parseNodeString(void)
 		return_value = _readCommonTableExpr();
 	else if (MATCH("SETOPERATIONSTMT", 16))
 		return_value = _readSetOperationStmt();
+	else if (MATCH("UNIQUEKEY", 9))
+		return_value = _readUniqueKey();
 	else if (MATCH("ALIAS", 5))
 		return_value = _readAlias();
 	else if (MATCH("RANGEVAR", 8))
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/README.uniquekey b/src/backend/optimizer/path/README.uniquekey
new file mode 100644
index 0000000000..5eac761995
--- /dev/null
+++ b/src/backend/optimizer/path/README.uniquekey
@@ -0,0 +1,131 @@
+1. What is UniqueKey?
+We can think UniqueKey is a set of exprs for a RelOptInfo, which we are insure
+that doesn't yields same result among all the rows. The simplest UniqueKey
+format is primary key.
+
+However we define the UnqiueKey as below.
+
+typedef struct UniqueKey
+{
+        NodeTag	type;
+        List	*exprs;
+        bool	multi_nullvals;
+} UniqueKey;
+
+exprs is a list of exprs which is unique on current RelOptInfo. exprs = NIL
+is a special case of UniqueKey, which means there is only one row in that
+relation.it has a stronger semantic than others. like SELECT uk FROM t; uk is
+normal unique key and may have different values. SELECT colx FROM t WHERE uk =
+const.  colx is unique AND we have only 1 value. This field can used for
+innerrel_is_unique. this logic is handled specially in add_uniquekey_for_onerow
+function.
+
+multi_nullvals: true means multi null values may exist in these exprs, so the
+uniqueness is not guaranteed in this case. This field is necessary for
+remove_useless_join & reduce_unique_semijoins where we don't mind these
+duplicated NULL values. It is set to true for 2 cases. One is a unique key
+from a unique index but the related column is nullable. The other one is for
+outer join. see populate_joinrel_uniquekeys for detail.
+
+
+The UniqueKey can be used at the following cases at least:
+1. remove_useless_joins.
+2. reduce_semianti_joins
+3. remove distinct node if distinct clause is unique.
+4. remove aggnode if group by clause is unique.
+5. Index Skip Scan (WIP)
+6. Aggregation Push Down without 2 phase aggregation if the join can't
+   duplicated the aggregated rows. (work in progress feature)
+
+2. How is it maintained?
+
+We have a set of populate_xxx_unqiuekeys functions to maintain the uniquekey on
+various cases. xxx includes baserel, joinrel, partitionedrel, distinctrel,
+groupedrel, unionrel. and we also need to convert the uniquekey from subquery
+to outer relation, which is what convert_subquery_uniquekeys does.
+
+1. The first part is about baserel. We handled 3 cases. suppose we have Unique
+Index on (a, b).
+
+1. SELECT a, b FROM t.  UniqueKey (a, b)
+2. SELECT a FROM t WHERE b = 1;  UniqueKey (a)
+3. SELECT .. FROM t WHERE a = 1 AND b = 1;  UniqueKey (NIL).  onerow case, every
+   column is Unique.
+
+2. The next part is joinrel, this part is most error-prone, we simplified the rules
+like below:
+1. If the relation's UniqueKey can't be duplicated after join,  then is will be
+   still valid for the join rel. The function we used here is
+   innerrel_keeps_unique. The basic idea is innerrel.any_col = outer.uk.
+
+2. If the UnqiueKey can't keep valid via the rule 1, the combination of the
+   UniqueKey from both sides are valid for sure.  We can prove this as: if the
+   unique exprs from rel1 is duplicated by rel2, the duplicated rows must
+   contains different unique exprs from rel2.
+
+More considerations about onerow:
+1. If relation with one row and it can't be duplicated, it is still possible
+   contains mulit_nullvas after outer join.
+2. If the either UniqueKey can be duplicated after join, the can get one row
+   only when both side is one row AND there is no outer join.
+3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
+   row UniqueKey to normal unique key since we don't store exprs for one-row
+   relation. get_exprs_from_uniquekeys will be used here.
+
+
+More considerations about multi_nullvals after join:
+1. If the original UnqiueKey has multi_nullvals, the final UniqueKey will have
+   mulit_nullvals in any case.
+2. If a unique key doesn't allow mulit_nullvals, after some outer join, it
+   allows some outer join.
+
+
+3. When we comes to subquery, we need to convert_subquery_unqiuekeys just like
+convert_subquery_pathkeys.  Only the UniqueKey insides subquery is referenced as
+a Var in outer relation will be reused. The relationship between the outerrel.Var
+and subquery.exprs is built with outerel->subroot->processed_tlist.
+
+
+4. As for the SRF functions, it will break the uniqueness of uniquekey, However it
+is handled in adjust_paths_for_srfs, which happens after the query_planner. so
+we will maintain the UniqueKey until there and reset it to NIL at that
+places. This can't help on distinct/group by elimination cases but probably help
+in some other cases, like reduce_unqiue_semijoins/remove_useless_joins and it is
+semantic correctly.
+
+
+5. As for inherit table, we first main the UnqiueKey on childrel as well. But for
+partitioned table we need to maintain 2 different kinds of
+UnqiueKey. 1). UniqueKey on the parent relation 2). UniqueKey on child
+relation for partition wise query.
+
+Example:
+CREATE TABLE p (a int not null, b int not null) partition by list (a);
+CREATE TABLE p0 partition of p for values in (1);
+CREATE TABLE p1 partition of p for values in (2);
+
+create unique index p0_b on p0(b);
+create unique index p1_b on p1(b);
+
+Now b is only unique on partition level, so the distinct can't be removed on
+the following cases. SELECT DISTINCT b FROM p;
+
+Another example is SELECT DISTINCT a, b FROM p WHERE a = 1; Since only one
+partition is chosen, the UniqueKey on child relation is same as the UniqueKey on
+parent relation.
+
+Another usage of UniqueKey on partition level is it be helpful for
+partition-wise join.
+
+As for the UniqueKey on parent table level, it comes with 2 different ways,
+1). the UniqueKey is also derived in UniqueKey index, but the index must be same
+in all the related children relations and the unique index must contains
+Partition Key in it. Example:
+
+CREATE UNIQUE INDEX p_ab ON p(a, b);  -- where a is the partition key.
+
+-- Query
+SELECT a, b FROM p; the (a, b) is a UniqueKey of p.
+
+2). If the parent relation has only one childrel, the UniqueKey on childrel is
+ the UniqueKey on parent as well.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f43cd60cbd..509699f973 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -579,6 +579,12 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	/*
+	 * Now that we've marked which partial indexes are suitable, we can now
+	 * build the relation's unique keys.
+	 */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
@@ -1307,6 +1313,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2380,6 +2388,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..ef0fd2fb0b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -71,13 +71,6 @@ static void consider_parallel_mergejoin(PlannerInfo *root,
 static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
-static List *select_mergejoin_clauses(PlannerInfo *root,
-									  RelOptInfo *joinrel,
-									  RelOptInfo *outerrel,
-									  RelOptInfo *innerrel,
-									  List *restrictlist,
-									  JoinType jointype,
-									  bool *mergejoin_allowed);
 static void generate_mergejoin_paths(PlannerInfo *root,
 									 RelOptInfo *joinrel,
 									 RelOptInfo *innerrel,
@@ -1927,7 +1920,7 @@ hash_inner_and_outer(PlannerInfo *root,
  * if it is mergejoinable and involves vars from the two sub-relations
  * currently of interest.
  */
-static List *
+List *
 select_mergejoin_clauses(PlannerInfo *root,
 						 RelOptInfo *joinrel,
 						 RelOptInfo *outerrel,
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2d343cd293..b9163ee8ff 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ce9bf87e9b..7e596d4194 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..b33bcd2f32
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1131 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/appendinfo.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/tlist.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys.
+ *
+ * added_to_joinrel is true if a uniquekey (from outerrel or innerrel)
+ * has been added to joinrel.
+ * useful is true if the exprs of the uniquekey still appears in joinrel.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	bool	added_to_joinrel;
+	bool	useful;
+} *UniqueKeyContext;
+
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel);
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+
+static List *get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+										List *const_exprs,
+										List *const_expr_opfamilies,
+										Bitmapset *used_varattrs,
+										bool *useful,
+										bool *multi_nullvals);
+static List *get_exprs_from_uniquekey(RelOptInfo *joinrel,
+									  RelOptInfo *rel1,
+									  UniqueKey *ukey);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+static bool add_combined_uniquekey(RelOptInfo *joinrel,
+								   RelOptInfo *outer_rel,
+								   RelOptInfo *inner_rel,
+								   UniqueKey *outer_ukey,
+								   UniqueKey *inner_ukey,
+								   JoinType jointype);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(PlannerInfo *root,
+													 RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(PlannerInfo *root,
+											  RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+
+/* Helper function for grouped relation and distinct relation. */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * and baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*matched_uniq_indexes = NIL;
+
+	/* Attrs appears in rel->reltarget->exprs. */
+	Bitmapset *used_attrs = NULL;
+
+	List	*const_exprs = NIL;
+	List	*expr_opfamilies = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+		matched_uniq_indexes = lappend(matched_uniq_indexes, ind);
+	}
+
+	if (matched_uniq_indexes  == NIL)
+		return;
+
+	/* Check which attrs is used in baserel->reltarget */
+	pull_varattnos((Node *)baserel->reltarget->exprs, baserel->relid, &used_attrs);
+
+	/* Check which attrno is used at a mergeable const filter */
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+		{
+			const_exprs = lappend(const_exprs, get_rightop(rinfo->clause));
+		}
+		else if (bms_is_empty(rinfo->right_relids))
+		{
+			const_exprs = lappend(const_exprs, get_leftop(rinfo->clause));
+		}
+		else
+			continue;
+
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, matched_uniq_indexes)
+	{
+		bool	multi_nullvals, useful;
+		List	*exprs = get_exprs_from_uniqueindex(lfirst_node(IndexOptInfo, lc),
+													const_exprs,
+													expr_opfamilies,
+													used_attrs,
+													&useful,
+													&multi_nullvals);
+		if (useful)
+		{
+			if (exprs == NIL)
+			{
+				/* All the columns in Unique Index matched with a restrictinfo */
+				add_uniquekey_for_onerow(baserel);
+				return;
+			}
+			baserel->uniquekeys = lappend(baserel->uniquekeys,
+										  makeUniqueKey(exprs, multi_nullvals));
+		}
+	}
+}
+
+
+/*
+ * populate_partitionedrel_uniquekeys
+ * The UniqueKey on partitionrel comes from 2 cases:
+ * 1). Only one partition is involved in this query, the unique key can be
+ * copied to parent rel from childrel.
+ * 2). There are some unique index which includes partition key and exists
+ * in all the related partitions.
+ * We never mind rule 2 if we hit rule 1.
+ */
+
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_uniq_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (childrels == NIL)
+		return;
+
+	/*
+	 * If there is only one partition used in this query, the UniqueKey in childrel is
+	 * still valid in parent level, but we need convert the format from child expr to
+	 * parent expr.
+	 */
+	if (list_length(childrels) == 1)
+	{
+		/* Check for Rule 1 */
+		RelOptInfo *childrel = linitial_node(RelOptInfo, childrels);
+		ListCell	*lc;
+		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+		if (relation_is_onerow(childrel))
+		{
+			add_uniquekey_for_onerow(rel);
+			return;
+		}
+
+		foreach(lc, childrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			AppendRelInfo *appinfo = find_appinfo_by_child(root, childrel->relid);
+			List *parent_exprs = NIL;
+			bool can_reuse = true;
+			ListCell	*lc2;
+			foreach(lc2, ukey->exprs)
+			{
+				Var *var = (Var *)lfirst(lc2);
+				/*
+				 * If the expr comes from a expression, it is hard to build the expression
+				 * in parent so ignore that case for now.
+				 */
+				if(!IsA(var, Var))
+				{
+					can_reuse = false;
+					break;
+				}
+				/* Convert it to parent var */
+				parent_exprs = lappend(parent_exprs, find_parent_var(appinfo, var));
+			}
+			if (can_reuse)
+				rel->uniquekeys = lappend(rel->uniquekeys,
+										  makeUniqueKey(parent_exprs,
+														ukey->multi_nullvals));
+		}
+	}
+	else
+	{
+		/* Check for rule 2 */
+		childrel = linitial_node(RelOptInfo, childrels);
+		foreach(lc, childrel->indexlist)
+		{
+			IndexOptInfo *ind = lfirst(lc);
+			IndexOptInfo *modified_index;
+			if (!ind->unique || !ind->immediate ||
+				(ind->indpred != NIL && !ind->predOK))
+				continue;
+
+			/*
+			 * During simple_copy_indexinfo_to_parent, we need to convert var from
+			 * child var to parent var, index on expression is too complex to handle.
+			 * so ignore it for now.
+			 */
+			if (ind->indexprs != NIL)
+				continue;
+
+			modified_index = simple_copy_indexinfo_to_parent(root, rel, ind);
+			/*
+			 * If the unique index doesn't contain partkey, then it is unique
+			 * on this partition only, so it is useless for us.
+			 */
+			if (!index_constains_partkey(rel, modified_index))
+				continue;
+
+			global_uniq_indexlist = lappend(global_uniq_indexlist,  modified_index);
+		}
+
+		if (global_uniq_indexlist != NIL)
+		{
+			foreach(lc, childrels)
+			{
+				RelOptInfo *child = lfirst(lc);
+				if (is_first)
+				{
+					is_first = false;
+					continue;
+				}
+				adjust_partition_unique_indexlist(root, rel, child, &global_uniq_indexlist);
+			}
+			/* Now we have a list of unique index which are exactly same on all childrels,
+			 * Set the UniqueKey just like it is non-partition table
+			 */
+			populate_baserel_uniquekeys(root, rel, global_uniq_indexlist);
+		}
+	}
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid. */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel,
+							 RelOptInfo *inputrel)
+
+{
+	Query *parse = root->parse;
+	bool input_ukey_added = false;
+	ListCell *lc;
+
+	if (relation_is_onerow(inputrel))
+	{
+		add_uniquekey_for_onerow(grouprel);
+		return;
+	}
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set. */
+	if (parse->groupClause)
+	{
+		/*
+		 * Current even the groupby clause is Unique already, but if query has aggref
+		 * We have to create grouprel still. To keep the UnqiueKey short, we will check
+		 * the UniqueKey of input_rel still valid, if so we reuse it.
+		 */
+		foreach(lc, inputrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(ukey->exprs, grouprel->reltarget->exprs))
+			{
+				grouprel->uniquekeys = lappend(grouprel->uniquekeys,
+											   ukey);
+				input_ukey_added = true;
+			}
+		}
+		if (!input_ukey_added)
+			/*
+			 * group by clause must be a super-set of grouprel->reltarget->exprs except the
+			 * aggregation expr, so if such exprs is unique already, no bother to generate
+			 * new uniquekey for group by exprs.
+			 */
+			add_uniquekey_from_sortgroups(root,
+										  grouprel,
+										  root->parse->groupClause);
+	}
+	else
+		/* It has aggregation but without a group by, so only one row returned */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquekey.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquekeys
+ */
+void
+populate_unionrel_uniquekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc, unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+	}
+
+	if (exprs == NIL)
+		/* SQL: select union select; is valid, we need to handle it here. */
+		add_uniquekey_for_onerow(unionrel);
+	else
+		unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+									   makeUniqueKey(exprs,false));
+
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if its
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed to true for some outer
+ * join cases and one-row UniqueKey needs to be converted to normal UniqueKey
+ * for the same case as well.
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	inner_onerow, outer_onerow;
+	bool	mergejoin_allowed;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+	Assert(jointype == JOIN_LEFT || jointype == JOIN_FULL || jointype == JOIN_INNER);
+
+	/* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	inner_onerow = relation_is_onerow(innerrel);
+	outer_onerow = relation_is_onerow(outerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(innerrel);
+
+	clause_list = select_mergejoin_clauses(root, joinrel, outerrel, innerrel,
+										   restrictlist, jointype,
+										   &mergejoin_allowed);
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true /* reverse */))
+	{
+		bool outer_impact = jointype == JOIN_FULL;
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/* Outer relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype == JOIN_FULL.
+			 */
+			if (outer_onerow && !outer_impact)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (outer_onerow)
+			{
+				/*
+				 * The onerow outerrel becomes multi rows and multi_nullvals
+				 * will be changed to true. We also need to set the exprs correctly since it
+				 * can't be NIL any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, outerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2), true));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && outer_impact)
+					/* Change multi_nullvals to true due to the full join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs, true));
+				else
+					/* Just reuse it */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		bool outer_impact = jointype == JOIN_FULL || jointype == JOIN_LEFT;;
+
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			if (inner_onerow &&  !outer_impact)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (inner_onerow)
+			{
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, innerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2), true));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && outer_impact)
+					/* Need to change multi_nullvals to true due to the outer join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true));
+				else
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/*
+	 * The combination of the UniqueKey from both sides is unique as well regardless
+	 * of join type, but no bother to add it if its subset has been added to joinrel
+	 * already or it is not useful for the joinrel.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			if (add_combined_uniquekey(joinrel, outerrel, innerrel,
+									   ctx1->uniquekey, ctx2->uniquekey,
+									   jointype))
+				/* If we set a onerow UniqueKey to joinrel, we don't need other. */
+				return;
+		}
+	}
+}
+
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * Covert the UniqueKey in subquery to outer relation.
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	ListCell	*lc;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+
+	Assert(currel->subroot != NULL);
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		ListCell	*lc;
+		List	*exprs = NIL;
+		bool	ukey_useful = true;
+
+		/* One row case is handled above */
+		Assert(ukey->exprs != NIL);
+		foreach(lc, ukey->exprs)
+		{
+			Var *var;
+			TargetEntry *tle = tlist_member(lfirst(lc),
+											currel->subroot->processed_tlist);
+			if (tle == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			var = find_var_for_subquery_tle(currel, tle);
+			if (var == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, var);
+		}
+
+		if (ukey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   ukey->multi_nullvals));
+
+	}
+}
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's any-column mergeop outrerel's uniquekey
+ * exists in clause_list.
+ *
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	/* Check if there is outerrel's uniquekey in mergeable clause. */
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey.
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	return ukey->exprs == NIL && list_length(rel->uniquekeys) == 1;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel,
+							List *exprs, bool allow_multinulls)
+{
+	ListCell *lc;
+
+	/*
+	 * For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+	 * so we can't rely on list_is_subset to handle this special cases
+	 */
+	if (exprs == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals && !allow_multinulls)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * get_exprs_from_uniqueindex
+ *
+ * Return a list of exprs which is unique. set useful to false if this
+ * unique index is not useful for us.
+ */
+static List *
+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+						   List *const_exprs,
+						   List *const_expr_opfamilies,
+						   Bitmapset *used_varattrs,
+						   bool *useful,
+						   bool *multi_nullvals)
+{
+	List	*exprs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+
+	*useful = true;
+	*multi_nullvals = false;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+
+		if(attr > 0)
+		{
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+		}
+		else if (attr == 0)
+		{
+			/* Expression index */
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+		}
+		else /* attr < 0 */
+		{
+			/* Index on system column is not supported */
+			Assert(false);
+		}
+
+		/*
+		 * Check index_col = Const case with regarding to opfamily checking
+		 * If we can remove the index_col from the final UniqueKey->exprs.
+		 */
+		forboth(lc1, const_exprs, lc2, const_expr_opfamilies)
+		{
+			if (list_member_oid((List *)lfirst(lc2), unique_index->opfamily[c])
+				&& match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		/* Check if the indexed expr is used in rel */
+		if (attr > 0)
+		{
+			/*
+			 * Normal Indexed column, if the col is not used, then the index is useless
+			 * for uniquekey.
+			 */
+			attr -= FirstLowInvalidHeapAttributeNumber;
+
+			if (!bms_is_member(attr, used_varattrs))
+			{
+				*useful = false;
+				break;
+			}
+		}
+		else if (!list_member(unique_index->rel->reltarget->exprs, expr))
+		{
+			/* Expression index but the expression is not used in rel */
+			*useful = false;
+			break;
+		}
+
+		/* check not null property. */
+		if (attr == 0)
+		{
+			/* We never know if a expression yields null or not */
+			*multi_nullvals = true;
+		}
+		else if (!bms_is_member(attr, unique_index->rel->notnullattrs)
+				 && !bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
+								   unique_index->rel->notnullattrs))
+		{
+			*multi_nullvals = true;
+		}
+
+		exprs = lappend(exprs, expr);
+	}
+	return exprs;
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure that the relation only returns one row, then all the columns
+ * are unique. However we don't need to create UniqueKey for every column, we
+ * just set exprs = NIL and overwrites all the other UniqueKey on this RelOptInfo
+ * since this one has strongest semantics.
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	/*
+	 * We overwrite the previous UniqueKey on purpose since this one has the
+	 * strongest semantic.
+	 */
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, false));
+}
+
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+
+/*
+ * get_exprs_from_uniquekey
+ *	Unify the way of get List of exprs from a one-row UniqueKey or
+ * normal UniqueKey. for the onerow case, every expr in rel1 is a valid
+ * UniqueKey. Return a List of exprs.
+ *
+ * rel1: The relation which you want to get the exprs.
+ * ukey: The UniqueKey you want to get the exprs.
+ */
+static List *
+get_exprs_from_uniquekey(RelOptInfo *joinrel, RelOptInfo *rel1, UniqueKey *ukey)
+{
+	ListCell *lc;
+	bool onerow = rel1 != NULL && relation_is_onerow(rel1);
+
+	List	*res = NIL;
+	Assert(onerow || ukey);
+	if (onerow)
+	{
+		/* Only cares about the exprs still exist in joinrel */
+		foreach(lc, joinrel->reltarget->exprs)
+		{
+			Bitmapset *relids = pull_varnos(lfirst(lc));
+			if (bms_is_subset(relids, rel1->relids))
+			{
+				res = lappend(res, list_make1(lfirst(lc)));
+			}
+		}
+	}
+	else
+	{
+		res = list_make1(ukey->exprs);
+	}
+	return res;
+}
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/*
+ * index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+	Assert(partrel->part_scheme->partnatts > 0);
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see
+ * simple_copy_indexinfo_to_parent)
+ */
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(get_tlist_exprs(ind1->indextlist, true),
+			  get_tlist_exprs(ind2->indextlist, true));
+}
+
+
+/*
+ * The below macros are used for simple_copy_indexinfo_to_parent which is so
+ * customized that I don't want to put it to copyfuncs.c. So copy it here.
+ */
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent (from partition)
+ * Copy the IndexInfo from child relation to parent relation with some modification,
+ * which is used to test:
+ * 1. If the same index exists in all the childrels.
+ * 2. If the parentrel->reltarget/basicrestrict info matches this index.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(PlannerInfo *root,
+								RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+	AppendRelInfo *appinfo = find_appinfo_by_child(root, from->rel->relid);
+	ListCell	*lc;
+	int	idx = 0;
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/* Convert index exprs on child expr to expr on parent */
+	foreach(lc, newnode->indextlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		/* Index on expression is ignored */
+		Assert(IsA(tle->expr, Var));
+		tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr);
+		newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno;
+		idx++;
+	}
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * global_unique_indexes: At the beginning, it contains the copy & modified
+ * unique index from the first partition. And then check if each index in it still
+ * exists in the following partitions. If no, remove it. at last, it has an
+ * index list which exists in all the partitions.
+ */
+static void
+adjust_partition_unique_indexlist(PlannerInfo *root,
+								  RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(root, parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+		if (!found_in_child)
+			/* The index doesn't exist in childrel, remove it from global_unique_indexes */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	List	*exprs;
+
+	/*
+	 * XXX: If there are some vars which is not in current levelsup, the semantic is
+	 * imprecise, should we avoid it or not? levelsup = 1 is just a demo, maybe we need to
+	 * check every level other than 0, if so, looks we have to write another
+	 * pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+
+	exprs = get_sortgrouplist_exprs(sortgroups, parse->targetList);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											false /* sortgroupclause can't be multi_nullvals */));
+}
+
+
+/*
+ * add_combined_uniquekey
+ * The combination of both UniqueKeys is a valid UniqueKey for joinrel no matter
+ * the jointype.
+ */
+bool
+add_combined_uniquekey(RelOptInfo *joinrel,
+					   RelOptInfo *outer_rel,
+					   RelOptInfo *inner_rel,
+					   UniqueKey *outer_ukey,
+					   UniqueKey *inner_ukey,
+					   JoinType jointype)
+{
+
+	ListCell	*lc1, *lc2;
+
+	/* Either side has multi_nullvals or we have outer join,
+	 * the combined UniqueKey has multi_nullvals */
+	bool multi_nullvals = outer_ukey->multi_nullvals ||
+		inner_ukey->multi_nullvals || IS_OUTER_JOIN(jointype);
+
+	/* The only case we can get onerow joinrel after join */
+	if  (relation_is_onerow(outer_rel)
+		 && relation_is_onerow(inner_rel)
+		 && jointype == JOIN_INNER)
+	{
+		add_uniquekey_for_onerow(joinrel);
+		return true;
+	}
+
+	foreach(lc1, get_exprs_from_uniquekey(joinrel, outer_rel, outer_ukey))
+	{
+		foreach(lc2, get_exprs_from_uniquekey(joinrel, inner_rel, inner_ukey))
+		{
+			List *exprs = list_concat_copy(lfirst_node(List, lc1), lfirst_node(List, lc2));
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														multi_nullvals));
+		}
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4131019fc9..33d188b17f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3899,6 +3901,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel, input_rel);
 	return grouped_rel;
 }
 
@@ -4615,7 +4619,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4911,7 +4915,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5172,6 +5176,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
@@ -6049,6 +6055,9 @@ adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 	if (list_length(targets) == 1)
 		return;
 
+	/* UniqueKey is not valid after handling the SRF. */
+	rel->uniquekeys = NIL;
+
 	/*
 	 * Stack SRF-evaluation nodes atop each path for the rel.
 	 *
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..e94e92937c 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index d722063cf3..44c37ecffc 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -746,3 +746,47 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
 	}
 	return appinfos;
 }
+
+/*
+ * find_appinfo_by_child
+ *
+ */
+AppendRelInfo *
+find_appinfo_by_child(PlannerInfo *root, Index child_index)
+{
+	ListCell	*lc;
+	foreach(lc, root->append_rel_list)
+	{
+		AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
+		if (appinfo->child_relid == child_index)
+			return appinfo;
+	}
+	elog(ERROR, "parent relation cant be found");
+	return NULL;
+}
+
+/*
+ * find_parent_var
+ *
+ */
+Var *
+find_parent_var(AppendRelInfo *appinfo, Var *child_var)
+{
+	ListCell	*lc;
+	Var	*res = NULL;
+	Index attno = 1;
+	foreach(lc, appinfo->translated_vars)
+	{
+		Node *child_node = lfirst(lc);
+		if (equal(child_node, child_var))
+		{
+			res = copyObject(child_var);
+			res->varattno = attno;
+			res->varno = appinfo->parent_relid;
+		}
+		attno++;
+	}
+	if (res == NULL)
+		elog(ERROR, "parent var can't be found.");
+	return res;
+}
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 3132fd35a5..d66b40ec50 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -736,6 +736,7 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 		{
 			Node	   *onecq = (Node *) lfirst(lc2);
 			bool		pseudoconstant;
+			RestrictInfo	*child_rinfo;
 
 			/* check for pseudoconstant (no Vars or volatile functions) */
 			pseudoconstant =
@@ -747,13 +748,14 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 				root->hasPseudoConstantQuals = true;
 			}
 			/* reconstitute RestrictInfo with appropriate properties */
-			childquals = lappend(childquals,
-								 make_restrictinfo((Expr *) onecq,
-												   rinfo->is_pushed_down,
-												   rinfo->outerjoin_delayed,
-												   pseudoconstant,
-												   rinfo->security_level,
-												   NULL, NULL, NULL));
+			child_rinfo =  make_restrictinfo((Expr *) onecq,
+											 rinfo->is_pushed_down,
+											 rinfo->outerjoin_delayed,
+											 pseudoconstant,
+											 rinfo->security_level,
+											 NULL, NULL, NULL);
+			child_rinfo->mergeopfamilies = rinfo->mergeopfamilies;
+			childquals = lappend(childquals, child_rinfo);
 			/* track minimum security level among child quals */
 			cq_min_security = Min(cq_min_security, rinfo->security_level);
 		}
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..c83f17acb7 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, bool multi_nullvals);
+
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 9e3ebd488a..02e4458bef 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -730,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1047,6 +1048,28 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo. exprs = NIL
+ * is a special case of UniqueKey, which means there is only 1 row in that
+ * relation.
+ * multi_nullvals: true means multi null values may exist in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	bool		multi_nullvals;
+} UniqueKey;
+
 /*
  * PathTarget
  *
@@ -2473,7 +2496,7 @@ typedef enum
  *
  * flags indicating what kinds of grouping are possible.
  * partial_costs_set is true if the agg_partial_costs and agg_final_costs
- * 		have been initialized.
+ *		have been initialized.
  * agg_partial_costs gives partial aggregation costs.
  * agg_final_costs gives finalization costs.
  * target_parallel_safe is true if target is parallel safe.
@@ -2503,8 +2526,8 @@ typedef struct
  * limit_tuples is an estimated bound on the number of output tuples,
  *		or -1 if no LIMIT or couldn't estimate.
  * count_est and offset_est are the estimated values of the LIMIT and OFFSET
- * 		expressions computed by preprocess_limit() (see comments for
- * 		preprocess_limit() for more information).
+ *		expressions computed by preprocess_limit() (see comments for
+ *		preprocess_limit() for more information).
  */
 typedef struct
 {
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index d6a27a60dd..e87c92a054 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -32,4 +32,7 @@ extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
 											   Relids relids, int *nappinfos);
 
+extern AppendRelInfo *find_appinfo_by_child(PlannerInfo *root, Index child_index);
+extern Var *find_parent_var(AppendRelInfo *appinfo, Var *child_var);
+
 #endif							/* APPENDINFO_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 3e4171056e..9445141263 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -23,6 +23,7 @@
 #define OPTIMIZER_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 /*
  * We don't want to include nodes/pathnodes.h here, because non-planner
@@ -156,6 +157,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..9217a8d6c6 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,5 +240,48 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   int strategy, bool nulls_first);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel,
+										 RelOptInfo *inputrel);
+extern void populate_unionrel_uniquekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs,
+										bool allow_multinulls);
+extern bool relation_is_onerow(RelOptInfo *rel);
 
 #endif							/* PATHS_H */
-- 
2.21.0

v9-0006-Join-removal-at-run-time-with-UniqueKey.patchapplication/octet-stream; name=v9-0006-Join-removal-at-run-time-with-UniqueKey.patchDownload

From 377371e8822cbb92ee87dc4ad45420235c122fbb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 7 May 2020 08:36:29 +0800
Subject: [PATCH v9 6/6] Join removal at run-time with UniqueKey.

We add another join removal during build_join_rel to use the
benefits of UniqueKey remove_useless_join.  However this new
strategy can't be an replacement of the current one since we just
knows 2 relation at that time. so it can't handle something like

SELECT a.* FROM a LEFT JOIN (b LEFT JOIN c ON b.c_id = c.id)
ON (a.b_id = b.id);

However it can handle more cases than the current strategy like

SELECT  t1.a
FROM m3 t1
LEFT JOIN (SELECT m1.a FROM m1, m2 WHERE m1.b = m2.a) t2;

As for the implementation, it is a very PoC version. The main
idea is if the join_canbe_removed, we still need a joinrel with
the information of outerrel only. However I have to do many works
to get there.

1. If the innerrel can be removed, we don't need to build pathlist for joinrel,
   we just reuse the pathlist from outerrel. However there are many places where
   use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
   have to change the parent to joinrel.
2. During create plan for some path on RTE_RELATION, it needs to know the
   relation Oid with path->parent->relid. so we have to use the outerrel->relid
   to overwrite the joinrel->relid which is 0 before.
3. Almost same paths as item 2, it usually assert best_path->parent->rtekind ==
   RTE_RELATION; now the path may appear in joinrel, so I used
   outerrel->rtekind to overwrite joinrel->rtekind.
4. I guess there are some dependencies between path->pathtarget and
   rel->reltarget. since we reuse the pathlist of outerrel, so I used the
   outer->reltarget as well. If the join can be removed, I guess the length of
   list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
   rely on the ProjectionPath to reduce the tlist.
---
 src/backend/optimizer/path/joinrels.c |   8 +-
 src/backend/optimizer/util/relnode.c  | 263 +++++++++++++++++++++++++-
 src/include/optimizer/pathnode.h      |   4 +-
 src/test/regress/expected/join.out    |  39 ++++
 src/test/regress/sql/join.sql         |  26 +++
 5 files changed, 333 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index b9163ee8ff..7e393c09f9 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -691,6 +691,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	SpecialJoinInfo sjinfo_data;
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
+	bool	innerrel_removed = false;
 
 	/* We should never try to join two overlapping sets of rels. */
 	Assert(!bms_overlap(rel1->relids, rel2->relids));
@@ -744,7 +745,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	 * goes with this particular joining.
 	 */
 	joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-							 &restrictlist);
+							 &restrictlist, &innerrel_removed);
 
 	/*
 	 * If we've already proven this join is empty, we needn't consider any
@@ -756,9 +757,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 		return joinrel;
 	}
 
+	if (!innerrel_removed)
 	/* Add paths to the join relation. */
-	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-								restrictlist);
+		populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+									restrictlist);
 
 	bms_free(joinrelids);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a203e6f1ff..f488d87b1c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -22,6 +22,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/inherit.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
@@ -73,6 +74,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
 
+static bool join_canbe_removed(PlannerInfo *root,
+							   SpecialJoinInfo *sjinfo,
+							   RelOptInfo *joinrel,
+							   RelOptInfo *innerrel,
+							   List *restrictlist);
 
 /*
  * setup_simple_rel_arrays
@@ -579,7 +585,8 @@ build_join_rel(PlannerInfo *root,
 			   RelOptInfo *outer_rel,
 			   RelOptInfo *inner_rel,
 			   SpecialJoinInfo *sjinfo,
-			   List **restrictlist_ptr)
+			   List **restrictlist_ptr,
+			   bool *innerrel_removed)
 {
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
@@ -718,6 +725,64 @@ build_join_rel(PlannerInfo *root,
 	 */
 	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
+	if (join_canbe_removed(root, sjinfo,
+						   joinrel, inner_rel,
+						   restrictlist))
+	{
+		ListCell *lc;
+
+		joinrel->rows = outer_rel->rows;
+		joinrel->consider_startup = outer_rel->consider_param_startup;
+		joinrel->consider_param_startup = outer_rel->consider_param_startup;
+		joinrel->consider_parallel = outer_rel->consider_parallel;
+
+		/* Rely on the projection path to reduce the tlist. */
+		joinrel->reltarget = outer_rel->reltarget;
+
+		joinrel->direct_lateral_relids = outer_rel->direct_lateral_relids;
+		joinrel->lateral_relids = outer_rel->lateral_relids;
+
+		joinrel->unique_for_rels = outer_rel->unique_for_rels;
+		joinrel->non_unique_for_rels = outer_rel->non_unique_for_rels;
+		joinrel->baserestrictinfo = outer_rel->baserestrictinfo;
+		joinrel->baserestrictcost = outer_rel->baserestrictcost;
+		joinrel->baserestrict_min_security = outer_rel->baserestrict_min_security;
+		joinrel->uniquekeys = outer_rel->uniquekeys;
+		joinrel->consider_partitionwise_join = outer_rel->consider_partitionwise_join;
+		joinrel->top_parent_relids = outer_rel->top_parent_relids;
+
+		/* Some scan path need to know which base relation to scan, it uses the relid
+		 * field, so we have to use the outerrel->relid.
+		 */
+		joinrel->relid = outer_rel->relid;
+
+		/* Almost the same paths as above, it assert the rte_kind is RTE_RELATION, so
+		 * we need to set as same as outerrel as well
+		 */
+		joinrel->rtekind = RTE_RELATION;
+
+		/* Make sure the path->parent point to current joinrel, can't update it in-place. */
+		foreach(lc, outer_rel->pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_path(joinrel, path);
+		}
+
+		foreach(lc, joinrel->partial_pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_partial_path(joinrel, path);
+		}
+		*innerrel_removed = true;
+	}
+	else
+	{
 	/* Store the partition information. */
 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
 								 sjinfo->jointype);
@@ -746,7 +811,7 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) restrictlist) &&
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
-
+	}
 	/* Add the joinrel to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
@@ -759,11 +824,18 @@ build_join_rel(PlannerInfo *root,
 	if (root->join_rel_level)
 	{
 		Assert(root->join_cur_level > 0);
-		Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+	   // Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
 		root->join_rel_level[root->join_cur_level] =
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	/* elog(INFO, "lev-%d Build JoinRel (%s) with %s and %s, inner is removed: %d", */
+	/*	 root->join_cur_level, */
+	/*	 bmsToString(joinrelids), */
+	/*	 bmsToString(outer_rel->relids), */
+	/*	 bmsToString(inner_rel->relids), */
+	/*	 joinrel->removed); */
+
 	return joinrel;
 }
 
@@ -2027,3 +2099,188 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+static bool
+join_canbe_removed(PlannerInfo *root,
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *joinrel,
+				   RelOptInfo *innerrel,
+				   List *restrictlist)
+{
+	Bitmapset	*vars;
+	List	*exprs = NIL;
+	ListCell	*lc;
+	Bitmapset	*tmp;
+	bool	res;
+
+	if (sjinfo->jointype != JOIN_LEFT)
+		return false;
+
+	if (innerrel->uniquekeys == NIL)
+		return false;
+
+	/*
+	 * Check if there is any innerrel's cols can't be removed.
+	 */
+
+	vars = pull_varnos((Node*)joinrel->reltarget->exprs);
+	tmp = bms_intersect(vars, innerrel->relids);
+	if (!bms_is_empty(tmp))
+		return false;
+
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		if (rinfo->can_join)
+		{
+			if (rinfo->mergeopfamilies != NIL)
+			{
+				if (bms_is_subset(rinfo->left_relids, innerrel->relids))
+					exprs = lappend(exprs, get_leftop(rinfo->clause));
+				else if (bms_is_subset(rinfo->right_relids, innerrel->relids))
+					exprs = lappend(exprs, get_rightop(rinfo->clause));
+				else
+					Assert(false);
+			}
+			else
+				/* Not mergeable join clause, we have to keep it */
+				return false;
+		}
+		else
+		{
+			/*
+			 * If the rinfo is not joinable clause, and it is not pushed down to
+			 * baserelation's basicrestrictinfo. so it must be in ON clauses.
+			 * Example: SELECT .. FROM t1 left join t2 on t1.a = 10;
+			 * In this case we can't remove the inner join as well.
+			 */
+			return false;
+		}
+	}
+	res =  relation_has_uniquekeys_for(root, innerrel, exprs, true);
+	return res;
+}
+
+
+size_t
+size_of_path(Path *path)
+{
+	switch(path->type)
+	{
+		case T_Path:
+			return sizeof(Path);
+		case T_IndexPath:
+			return sizeof(IndexPath);
+		case T_BitmapHeapPath:
+			return sizeof(BitmapHeapPath);
+		case T_TidPath:
+			return sizeof(TidPath);
+		case T_SubqueryScanPath:
+			return sizeof(SubqueryScanPath);
+		case T_ForeignPath:
+			return sizeof(ForeignPath);
+		case T_CustomPath:
+			return sizeof(CustomPath);
+
+
+		case T_NestPath:
+			return sizeof(NestPath);
+
+
+		case T_MergePath:
+			return sizeof(MergePath);
+
+
+		case T_HashPath:
+			return sizeof(HashPath);
+
+
+		case T_AppendPath:
+			return sizeof(AppendPath);
+
+
+		case T_MergeAppendPath:
+			return sizeof(MergeAppendPath);
+
+
+		case T_GroupResultPath:
+			return sizeof(GroupResultPath);
+
+
+		case T_MaterialPath:
+			return sizeof(MaterialPath);
+
+
+		case T_UniquePath:
+			return sizeof(UniquePath);
+
+
+		case T_GatherPath:
+			return sizeof(GatherPath);
+
+
+		case T_GatherMergePath:
+			return sizeof(GatherMergePath);
+
+
+		case T_ProjectionPath:
+			return sizeof(ProjectionPath);
+
+
+		case T_ProjectSetPath:
+			return sizeof(ProjectSetPath);
+
+
+		case T_SortPath:
+			return sizeof(SortPath);
+
+
+		case T_IncrementalSortPath:
+			return sizeof(IncrementalSortPath);
+
+
+		case T_GroupPath:
+			return sizeof(GroupPath);
+
+
+		case T_UpperUniquePath:
+			return sizeof(UpperUniquePath);
+
+
+		case T_AggPath:
+			return sizeof(AggPath);
+
+
+		case T_GroupingSetsPath:
+			return sizeof(GroupingSetsPath);
+
+
+		case T_MinMaxAggPath:
+			return sizeof(MinMaxAggPath);
+
+
+		case T_WindowAggPath:
+			return sizeof(WindowAggPath);
+
+
+		case T_SetOpPath:
+			return sizeof(SetOpPath);
+
+
+		case T_RecursiveUnionPath:
+			return sizeof(RecursiveUnionPath);
+
+
+		case T_LockRowsPath:
+			return sizeof(LockRowsPath);
+		case T_ModifyTablePath:
+			return sizeof(ModifyTablePath);
+		case T_LimitPath:
+			return sizeof(LimitPath);
+		default:
+			elog(ERROR, "unrecognized path type: %s",
+				 nodeToString(&path->type));
+			break;
+	}
+	return 0;
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..254961b2b4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -294,7 +294,8 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
 								  RelOptInfo *outer_rel,
 								  RelOptInfo *inner_rel,
 								  SpecialJoinInfo *sjinfo,
-								  List **restrictlist_ptr);
+								  List **restrictlist_ptr,
+								  bool *innerrel_removed);
 extern Relids min_join_parameterization(PlannerInfo *root,
 										Relids joinrelids,
 										RelOptInfo *outer_rel,
@@ -321,4 +322,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *parent_joinrel, List *restrictlist,
 										SpecialJoinInfo *sjinfo, JoinType jointype);
 
+extern size_t size_of_path(Path *path);
 #endif							/* PATHNODE_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8378936eda..b59c9a73ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6283,3 +6283,42 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+        QUERY PLAN        
+--------------------------
+ Seq Scan on public.m3 t1
+   Output: t1.a
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+drop table m1;
+drop table m2;
+drop table m3;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3312542411..317354547d 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2169,3 +2169,29 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+
+drop table m1;
+drop table m2;
+drop table m3;
-- 
2.21.0

#51

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Andy Fan (#50)

6 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Fixed a test case in v10.

--
Best Regards
Andy Fan

Attachments:

v10-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-str.patchapplication/octet-stream; name=v10-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-str.patchDownload

From 66c4990e19706e46185bef19a9a2421f1c755cc4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:50:52 +0800
Subject: [PATCH v10 2/6] Introduce UniqueKey attributes on RelOptInfo struct.

UniqueKey is a set of exprs on RelOptInfo which represents the exprs
will be unique on the given RelOptInfo. You can see README.uniquekey
for more information.
---
 src/backend/nodes/copyfuncs.c               |   13 +
 src/backend/nodes/list.c                    |   31 +
 src/backend/nodes/makefuncs.c               |   13 +
 src/backend/nodes/outfuncs.c                |   11 +
 src/backend/nodes/readfuncs.c               |   10 +
 src/backend/optimizer/path/Makefile         |    3 +-
 src/backend/optimizer/path/README.uniquekey |  131 +++
 src/backend/optimizer/path/allpaths.c       |   10 +
 src/backend/optimizer/path/joinpath.c       |    9 +-
 src/backend/optimizer/path/joinrels.c       |    2 +
 src/backend/optimizer/path/pathkeys.c       |    3 +-
 src/backend/optimizer/path/uniquekeys.c     | 1131 +++++++++++++++++++
 src/backend/optimizer/plan/planner.c        |   13 +-
 src/backend/optimizer/prep/prepunion.c      |    2 +
 src/backend/optimizer/util/appendinfo.c     |   44 +
 src/backend/optimizer/util/inherit.c        |   16 +-
 src/include/nodes/makefuncs.h               |    3 +
 src/include/nodes/nodes.h                   |    1 +
 src/include/nodes/pathnodes.h               |   29 +-
 src/include/nodes/pg_list.h                 |    2 +
 src/include/optimizer/appendinfo.h          |    3 +
 src/include/optimizer/optimizer.h           |    2 +
 src/include/optimizer/paths.h               |   43 +
 23 files changed, 1502 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/optimizer/path/README.uniquekey
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 89c409de66..1f50400fd2 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2273,6 +2273,16 @@ _copyPathKey(const PathKey *from)
 	return newnode;
 }
 
+static UniqueKey *
+_copyUniqueKey(const UniqueKey *from)
+{
+	UniqueKey	*newnode = makeNode(UniqueKey);
+
+	COPY_NODE_FIELD(exprs);
+	COPY_SCALAR_FIELD(multi_nullvals);
+
+	return newnode;
+}
 /*
  * _copyRestrictInfo
  */
@@ -5152,6 +5162,9 @@ copyObjectImpl(const void *from)
 		case T_PathKey:
 			retval = _copyPathKey(from);
 			break;
+		case T_UniqueKey:
+			retval = _copyUniqueKey(from);
+			break;
 		case T_RestrictInfo:
 			retval = _copyRestrictInfo(from);
 			break;
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..a7a99b70f2 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * return true iff every entry in "members" list is also present
+ * in the "target" list.
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 49de285f01..646cf7c9a1 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -814,3 +814,16 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, bool multi_nullvals)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	ukey->exprs = exprs;
+	ukey->multi_nullvals = multi_nullvals;
+	return ukey;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..c3a9632992 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2428,6 +2428,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+	WRITE_NODE_FIELD(exprs);
+	WRITE_BOOL_FIELD(multi_nullvals);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4127,6 +4135,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..3a18571d0c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -452,6 +452,14 @@ _readSetOperationStmt(void)
 	READ_DONE();
 }
 
+static UniqueKey *
+_readUniqueKey(void)
+{
+	READ_LOCALS(UniqueKey);
+	READ_NODE_FIELD(exprs);
+	READ_BOOL_FIELD(multi_nullvals);
+	READ_DONE();
+}
 
 /*
  *	Stuff from primnodes.h.
@@ -2656,6 +2664,8 @@ parseNodeString(void)
 		return_value = _readCommonTableExpr();
 	else if (MATCH("SETOPERATIONSTMT", 16))
 		return_value = _readSetOperationStmt();
+	else if (MATCH("UNIQUEKEY", 9))
+		return_value = _readUniqueKey();
 	else if (MATCH("ALIAS", 5))
 		return_value = _readAlias();
 	else if (MATCH("RANGEVAR", 8))
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/README.uniquekey b/src/backend/optimizer/path/README.uniquekey
new file mode 100644
index 0000000000..5eac761995
--- /dev/null
+++ b/src/backend/optimizer/path/README.uniquekey
@@ -0,0 +1,131 @@
+1. What is UniqueKey?
+We can think UniqueKey is a set of exprs for a RelOptInfo, which we are insure
+that doesn't yields same result among all the rows. The simplest UniqueKey
+format is primary key.
+
+However we define the UnqiueKey as below.
+
+typedef struct UniqueKey
+{
+        NodeTag	type;
+        List	*exprs;
+        bool	multi_nullvals;
+} UniqueKey;
+
+exprs is a list of exprs which is unique on current RelOptInfo. exprs = NIL
+is a special case of UniqueKey, which means there is only one row in that
+relation.it has a stronger semantic than others. like SELECT uk FROM t; uk is
+normal unique key and may have different values. SELECT colx FROM t WHERE uk =
+const.  colx is unique AND we have only 1 value. This field can used for
+innerrel_is_unique. this logic is handled specially in add_uniquekey_for_onerow
+function.
+
+multi_nullvals: true means multi null values may exist in these exprs, so the
+uniqueness is not guaranteed in this case. This field is necessary for
+remove_useless_join & reduce_unique_semijoins where we don't mind these
+duplicated NULL values. It is set to true for 2 cases. One is a unique key
+from a unique index but the related column is nullable. The other one is for
+outer join. see populate_joinrel_uniquekeys for detail.
+
+
+The UniqueKey can be used at the following cases at least:
+1. remove_useless_joins.
+2. reduce_semianti_joins
+3. remove distinct node if distinct clause is unique.
+4. remove aggnode if group by clause is unique.
+5. Index Skip Scan (WIP)
+6. Aggregation Push Down without 2 phase aggregation if the join can't
+   duplicated the aggregated rows. (work in progress feature)
+
+2. How is it maintained?
+
+We have a set of populate_xxx_unqiuekeys functions to maintain the uniquekey on
+various cases. xxx includes baserel, joinrel, partitionedrel, distinctrel,
+groupedrel, unionrel. and we also need to convert the uniquekey from subquery
+to outer relation, which is what convert_subquery_uniquekeys does.
+
+1. The first part is about baserel. We handled 3 cases. suppose we have Unique
+Index on (a, b).
+
+1. SELECT a, b FROM t.  UniqueKey (a, b)
+2. SELECT a FROM t WHERE b = 1;  UniqueKey (a)
+3. SELECT .. FROM t WHERE a = 1 AND b = 1;  UniqueKey (NIL).  onerow case, every
+   column is Unique.
+
+2. The next part is joinrel, this part is most error-prone, we simplified the rules
+like below:
+1. If the relation's UniqueKey can't be duplicated after join,  then is will be
+   still valid for the join rel. The function we used here is
+   innerrel_keeps_unique. The basic idea is innerrel.any_col = outer.uk.
+
+2. If the UnqiueKey can't keep valid via the rule 1, the combination of the
+   UniqueKey from both sides are valid for sure.  We can prove this as: if the
+   unique exprs from rel1 is duplicated by rel2, the duplicated rows must
+   contains different unique exprs from rel2.
+
+More considerations about onerow:
+1. If relation with one row and it can't be duplicated, it is still possible
+   contains mulit_nullvas after outer join.
+2. If the either UniqueKey can be duplicated after join, the can get one row
+   only when both side is one row AND there is no outer join.
+3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
+   row UniqueKey to normal unique key since we don't store exprs for one-row
+   relation. get_exprs_from_uniquekeys will be used here.
+
+
+More considerations about multi_nullvals after join:
+1. If the original UnqiueKey has multi_nullvals, the final UniqueKey will have
+   mulit_nullvals in any case.
+2. If a unique key doesn't allow mulit_nullvals, after some outer join, it
+   allows some outer join.
+
+
+3. When we comes to subquery, we need to convert_subquery_unqiuekeys just like
+convert_subquery_pathkeys.  Only the UniqueKey insides subquery is referenced as
+a Var in outer relation will be reused. The relationship between the outerrel.Var
+and subquery.exprs is built with outerel->subroot->processed_tlist.
+
+
+4. As for the SRF functions, it will break the uniqueness of uniquekey, However it
+is handled in adjust_paths_for_srfs, which happens after the query_planner. so
+we will maintain the UniqueKey until there and reset it to NIL at that
+places. This can't help on distinct/group by elimination cases but probably help
+in some other cases, like reduce_unqiue_semijoins/remove_useless_joins and it is
+semantic correctly.
+
+
+5. As for inherit table, we first main the UnqiueKey on childrel as well. But for
+partitioned table we need to maintain 2 different kinds of
+UnqiueKey. 1). UniqueKey on the parent relation 2). UniqueKey on child
+relation for partition wise query.
+
+Example:
+CREATE TABLE p (a int not null, b int not null) partition by list (a);
+CREATE TABLE p0 partition of p for values in (1);
+CREATE TABLE p1 partition of p for values in (2);
+
+create unique index p0_b on p0(b);
+create unique index p1_b on p1(b);
+
+Now b is only unique on partition level, so the distinct can't be removed on
+the following cases. SELECT DISTINCT b FROM p;
+
+Another example is SELECT DISTINCT a, b FROM p WHERE a = 1; Since only one
+partition is chosen, the UniqueKey on child relation is same as the UniqueKey on
+parent relation.
+
+Another usage of UniqueKey on partition level is it be helpful for
+partition-wise join.
+
+As for the UniqueKey on parent table level, it comes with 2 different ways,
+1). the UniqueKey is also derived in UniqueKey index, but the index must be same
+in all the related children relations and the unique index must contains
+Partition Key in it. Example:
+
+CREATE UNIQUE INDEX p_ab ON p(a, b);  -- where a is the partition key.
+
+-- Query
+SELECT a, b FROM p; the (a, b) is a UniqueKey of p.
+
+2). If the parent relation has only one childrel, the UniqueKey on childrel is
+ the UniqueKey on parent as well.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 484dab0a1a..2ad9d06d7a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -579,6 +579,12 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	/*
+	 * Now that we've marked which partial indexes are suitable, we can now
+	 * build the relation's unique keys.
+	 */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
@@ -1310,6 +1316,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2383,6 +2391,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..ef0fd2fb0b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -71,13 +71,6 @@ static void consider_parallel_mergejoin(PlannerInfo *root,
 static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
-static List *select_mergejoin_clauses(PlannerInfo *root,
-									  RelOptInfo *joinrel,
-									  RelOptInfo *outerrel,
-									  RelOptInfo *innerrel,
-									  List *restrictlist,
-									  JoinType jointype,
-									  bool *mergejoin_allowed);
 static void generate_mergejoin_paths(PlannerInfo *root,
 									 RelOptInfo *joinrel,
 									 RelOptInfo *innerrel,
@@ -1927,7 +1920,7 @@ hash_inner_and_outer(PlannerInfo *root,
  * if it is mergejoinable and involves vars from the two sub-relations
  * currently of interest.
  */
-static List *
+List *
 select_mergejoin_clauses(PlannerInfo *root,
 						 RelOptInfo *joinrel,
 						 RelOptInfo *outerrel,
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2d343cd293..b9163ee8ff 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ce9bf87e9b..7e596d4194 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..b33bcd2f32
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1131 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/appendinfo.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/tlist.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys.
+ *
+ * added_to_joinrel is true if a uniquekey (from outerrel or innerrel)
+ * has been added to joinrel.
+ * useful is true if the exprs of the uniquekey still appears in joinrel.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	bool	added_to_joinrel;
+	bool	useful;
+} *UniqueKeyContext;
+
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel);
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+
+static List *get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+										List *const_exprs,
+										List *const_expr_opfamilies,
+										Bitmapset *used_varattrs,
+										bool *useful,
+										bool *multi_nullvals);
+static List *get_exprs_from_uniquekey(RelOptInfo *joinrel,
+									  RelOptInfo *rel1,
+									  UniqueKey *ukey);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+static bool add_combined_uniquekey(RelOptInfo *joinrel,
+								   RelOptInfo *outer_rel,
+								   RelOptInfo *inner_rel,
+								   UniqueKey *outer_ukey,
+								   UniqueKey *inner_ukey,
+								   JoinType jointype);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(PlannerInfo *root,
+													 RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(PlannerInfo *root,
+											  RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+
+/* Helper function for grouped relation and distinct relation. */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * and baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*matched_uniq_indexes = NIL;
+
+	/* Attrs appears in rel->reltarget->exprs. */
+	Bitmapset *used_attrs = NULL;
+
+	List	*const_exprs = NIL;
+	List	*expr_opfamilies = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+		matched_uniq_indexes = lappend(matched_uniq_indexes, ind);
+	}
+
+	if (matched_uniq_indexes  == NIL)
+		return;
+
+	/* Check which attrs is used in baserel->reltarget */
+	pull_varattnos((Node *)baserel->reltarget->exprs, baserel->relid, &used_attrs);
+
+	/* Check which attrno is used at a mergeable const filter */
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+		{
+			const_exprs = lappend(const_exprs, get_rightop(rinfo->clause));
+		}
+		else if (bms_is_empty(rinfo->right_relids))
+		{
+			const_exprs = lappend(const_exprs, get_leftop(rinfo->clause));
+		}
+		else
+			continue;
+
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, matched_uniq_indexes)
+	{
+		bool	multi_nullvals, useful;
+		List	*exprs = get_exprs_from_uniqueindex(lfirst_node(IndexOptInfo, lc),
+													const_exprs,
+													expr_opfamilies,
+													used_attrs,
+													&useful,
+													&multi_nullvals);
+		if (useful)
+		{
+			if (exprs == NIL)
+			{
+				/* All the columns in Unique Index matched with a restrictinfo */
+				add_uniquekey_for_onerow(baserel);
+				return;
+			}
+			baserel->uniquekeys = lappend(baserel->uniquekeys,
+										  makeUniqueKey(exprs, multi_nullvals));
+		}
+	}
+}
+
+
+/*
+ * populate_partitionedrel_uniquekeys
+ * The UniqueKey on partitionrel comes from 2 cases:
+ * 1). Only one partition is involved in this query, the unique key can be
+ * copied to parent rel from childrel.
+ * 2). There are some unique index which includes partition key and exists
+ * in all the related partitions.
+ * We never mind rule 2 if we hit rule 1.
+ */
+
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_uniq_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (childrels == NIL)
+		return;
+
+	/*
+	 * If there is only one partition used in this query, the UniqueKey in childrel is
+	 * still valid in parent level, but we need convert the format from child expr to
+	 * parent expr.
+	 */
+	if (list_length(childrels) == 1)
+	{
+		/* Check for Rule 1 */
+		RelOptInfo *childrel = linitial_node(RelOptInfo, childrels);
+		ListCell	*lc;
+		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+		if (relation_is_onerow(childrel))
+		{
+			add_uniquekey_for_onerow(rel);
+			return;
+		}
+
+		foreach(lc, childrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			AppendRelInfo *appinfo = find_appinfo_by_child(root, childrel->relid);
+			List *parent_exprs = NIL;
+			bool can_reuse = true;
+			ListCell	*lc2;
+			foreach(lc2, ukey->exprs)
+			{
+				Var *var = (Var *)lfirst(lc2);
+				/*
+				 * If the expr comes from a expression, it is hard to build the expression
+				 * in parent so ignore that case for now.
+				 */
+				if(!IsA(var, Var))
+				{
+					can_reuse = false;
+					break;
+				}
+				/* Convert it to parent var */
+				parent_exprs = lappend(parent_exprs, find_parent_var(appinfo, var));
+			}
+			if (can_reuse)
+				rel->uniquekeys = lappend(rel->uniquekeys,
+										  makeUniqueKey(parent_exprs,
+														ukey->multi_nullvals));
+		}
+	}
+	else
+	{
+		/* Check for rule 2 */
+		childrel = linitial_node(RelOptInfo, childrels);
+		foreach(lc, childrel->indexlist)
+		{
+			IndexOptInfo *ind = lfirst(lc);
+			IndexOptInfo *modified_index;
+			if (!ind->unique || !ind->immediate ||
+				(ind->indpred != NIL && !ind->predOK))
+				continue;
+
+			/*
+			 * During simple_copy_indexinfo_to_parent, we need to convert var from
+			 * child var to parent var, index on expression is too complex to handle.
+			 * so ignore it for now.
+			 */
+			if (ind->indexprs != NIL)
+				continue;
+
+			modified_index = simple_copy_indexinfo_to_parent(root, rel, ind);
+			/*
+			 * If the unique index doesn't contain partkey, then it is unique
+			 * on this partition only, so it is useless for us.
+			 */
+			if (!index_constains_partkey(rel, modified_index))
+				continue;
+
+			global_uniq_indexlist = lappend(global_uniq_indexlist,  modified_index);
+		}
+
+		if (global_uniq_indexlist != NIL)
+		{
+			foreach(lc, childrels)
+			{
+				RelOptInfo *child = lfirst(lc);
+				if (is_first)
+				{
+					is_first = false;
+					continue;
+				}
+				adjust_partition_unique_indexlist(root, rel, child, &global_uniq_indexlist);
+			}
+			/* Now we have a list of unique index which are exactly same on all childrels,
+			 * Set the UniqueKey just like it is non-partition table
+			 */
+			populate_baserel_uniquekeys(root, rel, global_uniq_indexlist);
+		}
+	}
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid. */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel,
+							 RelOptInfo *inputrel)
+
+{
+	Query *parse = root->parse;
+	bool input_ukey_added = false;
+	ListCell *lc;
+
+	if (relation_is_onerow(inputrel))
+	{
+		add_uniquekey_for_onerow(grouprel);
+		return;
+	}
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set. */
+	if (parse->groupClause)
+	{
+		/*
+		 * Current even the groupby clause is Unique already, but if query has aggref
+		 * We have to create grouprel still. To keep the UnqiueKey short, we will check
+		 * the UniqueKey of input_rel still valid, if so we reuse it.
+		 */
+		foreach(lc, inputrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(ukey->exprs, grouprel->reltarget->exprs))
+			{
+				grouprel->uniquekeys = lappend(grouprel->uniquekeys,
+											   ukey);
+				input_ukey_added = true;
+			}
+		}
+		if (!input_ukey_added)
+			/*
+			 * group by clause must be a super-set of grouprel->reltarget->exprs except the
+			 * aggregation expr, so if such exprs is unique already, no bother to generate
+			 * new uniquekey for group by exprs.
+			 */
+			add_uniquekey_from_sortgroups(root,
+										  grouprel,
+										  root->parse->groupClause);
+	}
+	else
+		/* It has aggregation but without a group by, so only one row returned */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquekey.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquekeys
+ */
+void
+populate_unionrel_uniquekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc, unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+	}
+
+	if (exprs == NIL)
+		/* SQL: select union select; is valid, we need to handle it here. */
+		add_uniquekey_for_onerow(unionrel);
+	else
+		unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+									   makeUniqueKey(exprs,false));
+
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if its
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed to true for some outer
+ * join cases and one-row UniqueKey needs to be converted to normal UniqueKey
+ * for the same case as well.
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	inner_onerow, outer_onerow;
+	bool	mergejoin_allowed;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+	Assert(jointype == JOIN_LEFT || jointype == JOIN_FULL || jointype == JOIN_INNER);
+
+	/* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	inner_onerow = relation_is_onerow(innerrel);
+	outer_onerow = relation_is_onerow(outerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(innerrel);
+
+	clause_list = select_mergejoin_clauses(root, joinrel, outerrel, innerrel,
+										   restrictlist, jointype,
+										   &mergejoin_allowed);
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true /* reverse */))
+	{
+		bool outer_impact = jointype == JOIN_FULL;
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/* Outer relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype == JOIN_FULL.
+			 */
+			if (outer_onerow && !outer_impact)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (outer_onerow)
+			{
+				/*
+				 * The onerow outerrel becomes multi rows and multi_nullvals
+				 * will be changed to true. We also need to set the exprs correctly since it
+				 * can't be NIL any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, outerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2), true));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && outer_impact)
+					/* Change multi_nullvals to true due to the full join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs, true));
+				else
+					/* Just reuse it */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		bool outer_impact = jointype == JOIN_FULL || jointype == JOIN_LEFT;;
+
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			if (inner_onerow &&  !outer_impact)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (inner_onerow)
+			{
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, innerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2), true));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && outer_impact)
+					/* Need to change multi_nullvals to true due to the outer join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true));
+				else
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/*
+	 * The combination of the UniqueKey from both sides is unique as well regardless
+	 * of join type, but no bother to add it if its subset has been added to joinrel
+	 * already or it is not useful for the joinrel.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			if (add_combined_uniquekey(joinrel, outerrel, innerrel,
+									   ctx1->uniquekey, ctx2->uniquekey,
+									   jointype))
+				/* If we set a onerow UniqueKey to joinrel, we don't need other. */
+				return;
+		}
+	}
+}
+
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * Covert the UniqueKey in subquery to outer relation.
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	ListCell	*lc;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+
+	Assert(currel->subroot != NULL);
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		ListCell	*lc;
+		List	*exprs = NIL;
+		bool	ukey_useful = true;
+
+		/* One row case is handled above */
+		Assert(ukey->exprs != NIL);
+		foreach(lc, ukey->exprs)
+		{
+			Var *var;
+			TargetEntry *tle = tlist_member(lfirst(lc),
+											currel->subroot->processed_tlist);
+			if (tle == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			var = find_var_for_subquery_tle(currel, tle);
+			if (var == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, var);
+		}
+
+		if (ukey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   ukey->multi_nullvals));
+
+	}
+}
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's any-column mergeop outrerel's uniquekey
+ * exists in clause_list.
+ *
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	/* Check if there is outerrel's uniquekey in mergeable clause. */
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey.
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	return ukey->exprs == NIL && list_length(rel->uniquekeys) == 1;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel,
+							List *exprs, bool allow_multinulls)
+{
+	ListCell *lc;
+
+	/*
+	 * For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+	 * so we can't rely on list_is_subset to handle this special cases
+	 */
+	if (exprs == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals && !allow_multinulls)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * get_exprs_from_uniqueindex
+ *
+ * Return a list of exprs which is unique. set useful to false if this
+ * unique index is not useful for us.
+ */
+static List *
+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+						   List *const_exprs,
+						   List *const_expr_opfamilies,
+						   Bitmapset *used_varattrs,
+						   bool *useful,
+						   bool *multi_nullvals)
+{
+	List	*exprs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+
+	*useful = true;
+	*multi_nullvals = false;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	for(c = 0; c < unique_index->ncolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+
+		if(attr > 0)
+		{
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+		}
+		else if (attr == 0)
+		{
+			/* Expression index */
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+		}
+		else /* attr < 0 */
+		{
+			/* Index on system column is not supported */
+			Assert(false);
+		}
+
+		/*
+		 * Check index_col = Const case with regarding to opfamily checking
+		 * If we can remove the index_col from the final UniqueKey->exprs.
+		 */
+		forboth(lc1, const_exprs, lc2, const_expr_opfamilies)
+		{
+			if (list_member_oid((List *)lfirst(lc2), unique_index->opfamily[c])
+				&& match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		/* Check if the indexed expr is used in rel */
+		if (attr > 0)
+		{
+			/*
+			 * Normal Indexed column, if the col is not used, then the index is useless
+			 * for uniquekey.
+			 */
+			attr -= FirstLowInvalidHeapAttributeNumber;
+
+			if (!bms_is_member(attr, used_varattrs))
+			{
+				*useful = false;
+				break;
+			}
+		}
+		else if (!list_member(unique_index->rel->reltarget->exprs, expr))
+		{
+			/* Expression index but the expression is not used in rel */
+			*useful = false;
+			break;
+		}
+
+		/* check not null property. */
+		if (attr == 0)
+		{
+			/* We never know if a expression yields null or not */
+			*multi_nullvals = true;
+		}
+		else if (!bms_is_member(attr, unique_index->rel->notnullattrs)
+				 && !bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
+								   unique_index->rel->notnullattrs))
+		{
+			*multi_nullvals = true;
+		}
+
+		exprs = lappend(exprs, expr);
+	}
+	return exprs;
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure that the relation only returns one row, then all the columns
+ * are unique. However we don't need to create UniqueKey for every column, we
+ * just set exprs = NIL and overwrites all the other UniqueKey on this RelOptInfo
+ * since this one has strongest semantics.
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	/*
+	 * We overwrite the previous UniqueKey on purpose since this one has the
+	 * strongest semantic.
+	 */
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, false));
+}
+
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+
+/*
+ * get_exprs_from_uniquekey
+ *	Unify the way of get List of exprs from a one-row UniqueKey or
+ * normal UniqueKey. for the onerow case, every expr in rel1 is a valid
+ * UniqueKey. Return a List of exprs.
+ *
+ * rel1: The relation which you want to get the exprs.
+ * ukey: The UniqueKey you want to get the exprs.
+ */
+static List *
+get_exprs_from_uniquekey(RelOptInfo *joinrel, RelOptInfo *rel1, UniqueKey *ukey)
+{
+	ListCell *lc;
+	bool onerow = rel1 != NULL && relation_is_onerow(rel1);
+
+	List	*res = NIL;
+	Assert(onerow || ukey);
+	if (onerow)
+	{
+		/* Only cares about the exprs still exist in joinrel */
+		foreach(lc, joinrel->reltarget->exprs)
+		{
+			Bitmapset *relids = pull_varnos(lfirst(lc));
+			if (bms_is_subset(relids, rel1->relids))
+			{
+				res = lappend(res, list_make1(lfirst(lc)));
+			}
+		}
+	}
+	else
+	{
+		res = list_make1(ukey->exprs);
+	}
+	return res;
+}
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/*
+ * index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+	Assert(partrel->part_scheme->partnatts > 0);
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see
+ * simple_copy_indexinfo_to_parent)
+ */
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(get_tlist_exprs(ind1->indextlist, true),
+			  get_tlist_exprs(ind2->indextlist, true));
+}
+
+
+/*
+ * The below macros are used for simple_copy_indexinfo_to_parent which is so
+ * customized that I don't want to put it to copyfuncs.c. So copy it here.
+ */
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent (from partition)
+ * Copy the IndexInfo from child relation to parent relation with some modification,
+ * which is used to test:
+ * 1. If the same index exists in all the childrels.
+ * 2. If the parentrel->reltarget/basicrestrict info matches this index.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(PlannerInfo *root,
+								RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+	AppendRelInfo *appinfo = find_appinfo_by_child(root, from->rel->relid);
+	ListCell	*lc;
+	int	idx = 0;
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/* Convert index exprs on child expr to expr on parent */
+	foreach(lc, newnode->indextlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		/* Index on expression is ignored */
+		Assert(IsA(tle->expr, Var));
+		tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr);
+		newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno;
+		idx++;
+	}
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * global_unique_indexes: At the beginning, it contains the copy & modified
+ * unique index from the first partition. And then check if each index in it still
+ * exists in the following partitions. If no, remove it. at last, it has an
+ * index list which exists in all the partitions.
+ */
+static void
+adjust_partition_unique_indexlist(PlannerInfo *root,
+								  RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(root, parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+		if (!found_in_child)
+			/* The index doesn't exist in childrel, remove it from global_unique_indexes */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	List	*exprs;
+
+	/*
+	 * XXX: If there are some vars which is not in current levelsup, the semantic is
+	 * imprecise, should we avoid it or not? levelsup = 1 is just a demo, maybe we need to
+	 * check every level other than 0, if so, looks we have to write another
+	 * pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+
+	exprs = get_sortgrouplist_exprs(sortgroups, parse->targetList);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											false /* sortgroupclause can't be multi_nullvals */));
+}
+
+
+/*
+ * add_combined_uniquekey
+ * The combination of both UniqueKeys is a valid UniqueKey for joinrel no matter
+ * the jointype.
+ */
+bool
+add_combined_uniquekey(RelOptInfo *joinrel,
+					   RelOptInfo *outer_rel,
+					   RelOptInfo *inner_rel,
+					   UniqueKey *outer_ukey,
+					   UniqueKey *inner_ukey,
+					   JoinType jointype)
+{
+
+	ListCell	*lc1, *lc2;
+
+	/* Either side has multi_nullvals or we have outer join,
+	 * the combined UniqueKey has multi_nullvals */
+	bool multi_nullvals = outer_ukey->multi_nullvals ||
+		inner_ukey->multi_nullvals || IS_OUTER_JOIN(jointype);
+
+	/* The only case we can get onerow joinrel after join */
+	if  (relation_is_onerow(outer_rel)
+		 && relation_is_onerow(inner_rel)
+		 && jointype == JOIN_INNER)
+	{
+		add_uniquekey_for_onerow(joinrel);
+		return true;
+	}
+
+	foreach(lc1, get_exprs_from_uniquekey(joinrel, outer_rel, outer_ukey))
+	{
+		foreach(lc2, get_exprs_from_uniquekey(joinrel, inner_rel, inner_ukey))
+		{
+			List *exprs = list_concat_copy(lfirst_node(List, lc1), lfirst_node(List, lc2));
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														multi_nullvals));
+		}
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b406d41e91..0551ae0512 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3899,6 +3901,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel, input_rel);
 	return grouped_rel;
 }
 
@@ -4615,7 +4619,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4911,7 +4915,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5172,6 +5176,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
@@ -6049,6 +6055,9 @@ adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 	if (list_length(targets) == 1)
 		return;
 
+	/* UniqueKey is not valid after handling the SRF. */
+	rel->uniquekeys = NIL;
+
 	/*
 	 * Stack SRF-evaluation nodes atop each path for the rel.
 	 *
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 951aed80e7..e94e92937c 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index d722063cf3..44c37ecffc 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -746,3 +746,47 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
 	}
 	return appinfos;
 }
+
+/*
+ * find_appinfo_by_child
+ *
+ */
+AppendRelInfo *
+find_appinfo_by_child(PlannerInfo *root, Index child_index)
+{
+	ListCell	*lc;
+	foreach(lc, root->append_rel_list)
+	{
+		AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
+		if (appinfo->child_relid == child_index)
+			return appinfo;
+	}
+	elog(ERROR, "parent relation cant be found");
+	return NULL;
+}
+
+/*
+ * find_parent_var
+ *
+ */
+Var *
+find_parent_var(AppendRelInfo *appinfo, Var *child_var)
+{
+	ListCell	*lc;
+	Var	*res = NULL;
+	Index attno = 1;
+	foreach(lc, appinfo->translated_vars)
+	{
+		Node *child_node = lfirst(lc);
+		if (equal(child_node, child_var))
+		{
+			res = copyObject(child_var);
+			res->varattno = attno;
+			res->varno = appinfo->parent_relid;
+		}
+		attno++;
+	}
+	if (res == NULL)
+		elog(ERROR, "parent var can't be found.");
+	return res;
+}
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 3132fd35a5..d66b40ec50 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -736,6 +736,7 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 		{
 			Node	   *onecq = (Node *) lfirst(lc2);
 			bool		pseudoconstant;
+			RestrictInfo	*child_rinfo;
 
 			/* check for pseudoconstant (no Vars or volatile functions) */
 			pseudoconstant =
@@ -747,13 +748,14 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 				root->hasPseudoConstantQuals = true;
 			}
 			/* reconstitute RestrictInfo with appropriate properties */
-			childquals = lappend(childquals,
-								 make_restrictinfo((Expr *) onecq,
-												   rinfo->is_pushed_down,
-												   rinfo->outerjoin_delayed,
-												   pseudoconstant,
-												   rinfo->security_level,
-												   NULL, NULL, NULL));
+			child_rinfo =  make_restrictinfo((Expr *) onecq,
+											 rinfo->is_pushed_down,
+											 rinfo->outerjoin_delayed,
+											 pseudoconstant,
+											 rinfo->security_level,
+											 NULL, NULL, NULL);
+			child_rinfo->mergeopfamilies = rinfo->mergeopfamilies;
+			childquals = lappend(childquals, child_rinfo);
 			/* track minimum security level among child quals */
 			cq_min_security = Min(cq_min_security, rinfo->security_level);
 		}
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..c83f17acb7 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, bool multi_nullvals);
+
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 9e3ebd488a..02e4458bef 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -730,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1047,6 +1048,28 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo. exprs = NIL
+ * is a special case of UniqueKey, which means there is only 1 row in that
+ * relation.
+ * multi_nullvals: true means multi null values may exist in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	bool		multi_nullvals;
+} UniqueKey;
+
 /*
  * PathTarget
  *
@@ -2473,7 +2496,7 @@ typedef enum
  *
  * flags indicating what kinds of grouping are possible.
  * partial_costs_set is true if the agg_partial_costs and agg_final_costs
- * 		have been initialized.
+ *		have been initialized.
  * agg_partial_costs gives partial aggregation costs.
  * agg_final_costs gives finalization costs.
  * target_parallel_safe is true if target is parallel safe.
@@ -2503,8 +2526,8 @@ typedef struct
  * limit_tuples is an estimated bound on the number of output tuples,
  *		or -1 if no LIMIT or couldn't estimate.
  * count_est and offset_est are the estimated values of the LIMIT and OFFSET
- * 		expressions computed by preprocess_limit() (see comments for
- * 		preprocess_limit() for more information).
+ *		expressions computed by preprocess_limit() (see comments for
+ *		preprocess_limit() for more information).
  */
 typedef struct
 {
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index d6a27a60dd..e87c92a054 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -32,4 +32,7 @@ extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
 											   Relids relids, int *nappinfos);
 
+extern AppendRelInfo *find_appinfo_by_child(PlannerInfo *root, Index child_index);
+extern Var *find_parent_var(AppendRelInfo *appinfo, Var *child_var);
+
 #endif							/* APPENDINFO_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 3e4171056e..9445141263 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -23,6 +23,7 @@
 #define OPTIMIZER_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 /*
  * We don't want to include nodes/pathnodes.h here, because non-planner
@@ -156,6 +157,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..9217a8d6c6 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,5 +240,48 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   int strategy, bool nulls_first);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel,
+										 RelOptInfo *inputrel);
+extern void populate_unionrel_uniquekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs,
+										bool allow_multinulls);
+extern bool relation_is_onerow(RelOptInfo *rel);
 
 #endif							/* PATHS_H */
-- 
2.21.0

v10-0005-Treat-the-input-as-sorted-if-the-group-by-clause.patchapplication/octet-stream; name=v10-0005-Treat-the-input-as-sorted-if-the-group-by-clause.patchDownload

From 43ed2d635a7e3d4ba05e95c9dc47861084f149b5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 16:32:28 +0800
Subject: [PATCH v10 5/6] Treat the input as sorted if the group by clause is
 unique already.

In the previous commit, if no aggregation call, the group node will
be removed totally, so this patch is just for the cases where aggregation
call exists
---
 src/backend/commands/explain.c           |   4 +
 src/backend/executor/nodeAgg.c           |   6 ++
 src/backend/optimizer/plan/createplan.c  |   3 +-
 src/backend/optimizer/plan/planner.c     | 127 +++++++++++++++--------
 src/include/nodes/nodes.h                |   3 +-
 src/include/nodes/pathnodes.h            |   1 +
 src/include/nodes/plannodes.h            |   1 +
 src/test/regress/expected/aggregates.out |  42 ++++++++
 src/test/regress/sql/aggregates.sql      |  17 +++
 9 files changed, 159 insertions(+), 45 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a283e4d45c..7ec9f06146 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1930,6 +1930,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
+			if (es->format != EXPLAIN_FORMAT_TEXT ||
+				(es->verbose && ((Agg *) plan)->input_unique))
+				ExplainPropertyBool("Input Unique",
+									((Agg *) plan)->input_unique, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index b79c845a6b..ab9ab1175d 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2172,6 +2172,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_UNIQUE:
+				/* AGG_UNIQUE is translated to AGG_SORTED, Handle it here
+				 * to make compiler quiet.
+				 */
+				Assert(false);
+				break;
 		}
 
 		if (!TupIsNull(result))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..c7481436e4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6355,7 +6355,7 @@ make_agg(List *tlist, List *qual,
 	/* Reduce to long, but 'ware overflow! */
 	numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 
-	node->aggstrategy = aggstrategy;
+	node->aggstrategy = aggstrategy == AGG_UNIQUE ? AGG_SORTED : aggstrategy;
 	node->aggsplit = aggsplit;
 	node->numCols = numGroupCols;
 	node->grpColIdx = grpColIdx;
@@ -6366,6 +6366,7 @@ make_agg(List *tlist, List *qual,
 	node->aggParams = NULL;		/* SS_finalize_plan() will fill this */
 	node->groupingSets = groupingSets;
 	node->chain = chain;
+	node->input_unique = aggstrategy == AGG_UNIQUE;
 
 	plan->qual = qual;
 	plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e25407cf58..88f040dce1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3852,51 +3852,61 @@ create_grouping_paths(PlannerInfo *root,
 		int			flags = 0;
 		GroupPathExtraData extra;
 
-		/*
-		 * Determine whether it's possible to perform sort-based
-		 * implementations of grouping.  (Note that if groupClause is empty,
-		 * grouping_is_sortable() is trivially true, and all the
-		 * pathkeys_contained_in() tests will succeed too, so that we'll
-		 * consider every surviving input path.)
-		 *
-		 * If we have grouping sets, we might be able to sort some but not all
-		 * of them; in this case, we need can_sort to be true as long as we
-		 * must consider any sorted-input plan.
-		 */
-		if ((gd && gd->rollups != NIL)
-			|| grouping_is_sortable(parse->groupClause))
-			flags |= GROUPING_CAN_USE_SORT;
+		if (group_unique_input)
+		{
+			/* In this case we don't need to set other flags */
+			Assert(parse->groupClause != NIL);
+			Assert(gd == NULL);
+			flags |= GROUPING_INPUT_UNIQUE;
+		}
+		else
+		{
+			/*
+			 * Determine whether it's possible to perform sort-based
+			 * implementations of grouping.  (Note that if groupClause is empty,
+			 * grouping_is_sortable() is trivially true, and all the
+			 * pathkeys_contained_in() tests will succeed too, so that we'll
+			 * consider every surviving input path.)
+			 *
+			 * If we have grouping sets, we might be able to sort some but not all
+			 * of them; in this case, we need can_sort to be true as long as we
+			 * must consider any sorted-input plan.
+			 */
+			if ((gd && gd->rollups != NIL)
+				|| grouping_is_sortable(parse->groupClause))
+				flags |= GROUPING_CAN_USE_SORT;
 
-		/*
-		 * Determine whether we should consider hash-based implementations of
-		 * grouping.
-		 *
-		 * Hashed aggregation only applies if we're grouping. If we have
-		 * grouping sets, some groups might be hashable but others not; in
-		 * this case we set can_hash true as long as there is nothing globally
-		 * preventing us from hashing (and we should therefore consider plans
-		 * with hashes).
-		 *
-		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
-		 * BY aggregates.  (Doing so would imply storing *all* the input
-		 * values in the hash table, and/or running many sorts in parallel,
-		 * either of which seems like a certain loser.)  We similarly don't
-		 * support ordered-set aggregates in hashed aggregation, but that case
-		 * is also included in the numOrderedAggs count.
-		 *
-		 * Note: grouping_is_hashable() is much more expensive to check than
-		 * the other gating conditions, so we want to do it last.
-		 */
-		if ((parse->groupClause != NIL &&
-			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
-			flags |= GROUPING_CAN_USE_HASH;
+			/*
+			 * Determine whether we should consider hash-based implementations of
+			 * grouping.
+			 *
+			 * Hashed aggregation only applies if we're grouping. If we have
+			 * grouping sets, some groups might be hashable but others not; in
+			 * this case we set can_hash true as long as there is nothing globally
+			 * preventing us from hashing (and we should therefore consider plans
+			 * with hashes).
+			 *
+			 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+			 * BY aggregates.  (Doing so would imply storing *all* the input
+			 * values in the hash table, and/or running many sorts in parallel,
+			 * either of which seems like a certain loser.)  We similarly don't
+			 * support ordered-set aggregates in hashed aggregation, but that case
+			 * is also included in the numOrderedAggs count.
+			 *
+			 * Note: grouping_is_hashable() is much more expensive to check than
+			 * the other gating conditions, so we want to do it last.
+			 */
+			if ((parse->groupClause != NIL &&
+				 agg_costs->numOrderedAggs == 0 &&
+				 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+				flags |= GROUPING_CAN_USE_HASH;
 
-		/*
-		 * Determine whether partial aggregation is possible.
-		 */
-		if (can_partial_agg(root, agg_costs))
-			flags |= GROUPING_CAN_PARTIAL_AGG;
+			/*
+			 * Determine whether partial aggregation is possible.
+			 */
+			if (can_partial_agg(root, agg_costs))
+				flags |= GROUPING_CAN_PARTIAL_AGG;
+		}
 
 		extra.flags = flags;
 		extra.target_parallel_safe = target_parallel_safe;
@@ -6522,9 +6532,40 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		group_input_unique = (extra->flags & GROUPING_INPUT_UNIQUE) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
+	if (group_input_unique)
+	{
+		Path *path = input_rel->cheapest_total_path;
+		add_path(grouped_rel, (Path *) create_agg_path(root,
+													   grouped_rel,
+													   path,
+													   grouped_rel->reltarget,
+													   AGG_UNIQUE,
+													   AGGSPLIT_SIMPLE,
+													   parse->groupClause,
+													   havingQual,
+													   agg_costs,
+													   dNumGroups));
+
+		if (path != input_rel->cheapest_startup_path)
+		{
+			path = input_rel->cheapest_startup_path;
+			add_path(grouped_rel, (Path *) create_agg_path(root,
+														   grouped_rel,
+														   path,
+														   grouped_rel->reltarget,
+														   AGG_UNIQUE,
+														   AGGSPLIT_SIMPLE,
+														   parse->groupClause,
+														   havingQual,
+														   agg_costs,
+														   dNumGroups));
+		}
+		return;
+	}
 	if (can_sort)
 	{
 		/*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 41110ed888..010266ed4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -761,7 +761,8 @@ typedef enum AggStrategy
 	AGG_PLAIN,					/* simple agg across all input rows */
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
-	AGG_MIXED					/* grouped agg, hash and sort both used */
+	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_UNIQUE					/* grouped agg, the group clause is unique */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 02e4458bef..b4bb4e6267 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2472,6 +2472,7 @@ typedef struct JoinPathExtraData
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_INPUT_UNIQUE		0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..be31020a40 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -828,6 +828,7 @@ typedef struct Agg
 	/* Note: planner provides numGroups & aggParams only in HASHED/MIXED case */
 	List	   *groupingSets;	/* grouping sets to use */
 	List	   *chain;			/* chained Agg/Sort nodes */
+	bool		input_unique;   /* The input is unique already */
 } Agg;
 
 /* ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 87fef9f417..338e71e06d 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2699,6 +2699,46 @@ set work_mem to default;
 ----+----+----
 (0 rows)
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+              QUERY PLAN               
+---------------------------------------
+ GroupAggregate
+   Output: pk, sum(b)
+   Group Key: agg_unique_1.pk
+   Input Unique: true
+   ->  Seq Scan on public.agg_unique_1
+         Output: pk, b
+(6 rows)
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ GroupAggregate
+   Output: agg_unique_2.unsortable_col, count(*)
+   Group Key: agg_unique_2.unsortable_col
+   Input Unique: true
+   ->  HashAggregate
+         Output: agg_unique_2.unsortable_col
+         Group Key: agg_unique_2.unsortable_col
+         ->  Seq Scan on public.agg_unique_2
+               Output: agg_unique_2.a, agg_unique_2.unsortable_col
+(9 rows)
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+ unsortable_col | count 
+----------------+-------
+              2 |     1
+              1 |     1
+(2 rows)
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -2707,3 +2747,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ad025206fb..439eec18ac 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1202,6 +1202,21 @@ set work_mem to default;
   union all
 (select * from agg_group_4 except select * from agg_hash_4);
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -1210,3 +1225,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
-- 
2.21.0

v10-0003-Refactoring-existing-uniqueness-related-code-to-.patchapplication/octet-stream; name=v10-0003-Refactoring-existing-uniqueness-related-code-to-.patchDownload

From 1e7c765f42b6a9b1f69970970cf422582dff16b6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 15:44:15 +0800
Subject: [PATCH v10 3/6] Refactoring existing uniqueness related code to use
 UniqueKey

The call of remove_useless_joins and reduce_unique_semijoins are
postponed due to this as well
---
 .../postgres_fdw/expected/postgres_fdw.out    |  32 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   1 +
 src/backend/optimizer/path/allpaths.c         |  18 ++-
 src/backend/optimizer/plan/analyzejoins.c     | 137 ++++--------------
 src/backend/optimizer/plan/planmain.c         |  13 --
 src/test/regress/expected/join.out            |  59 ++++----
 src/test/regress/sql/join.sql                 |  16 +-
 7 files changed, 97 insertions(+), 179 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 90db550b92..d71b6cc556 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -2902,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83971665e3..a42cfa134d 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2ad9d06d7a..6c18dd8a88 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..d80bff65d2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -584,39 +586,7 @@ reduce_unique_semijoins(PlannerInfo *root)
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +610,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +892,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..8378936eda 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4411,40 +4411,37 @@ select d.* from d left join (select distinct * from b) s
 (1 row)
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..3312542411 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1512,17 +1512,15 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id and d.b = s.c_id;
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.21.0

v10-0004-Remove-distinct-node-AggNode-if-the-input-is-uni.patchapplication/octet-stream; name=v10-0004-Remove-distinct-node-AggNode-if-the-input-is-uni.patchDownload

From b3adf46d282cb0a382d3a1a8dcdccefc579df682 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:54:49 +0800
Subject: [PATCH v10 4/6] Remove distinct node & AggNode if the input is unique
 already.

---
 src/backend/optimizer/plan/planner.c          |  30 +-
 src/test/regress/expected/aggregates.out      |  73 ++--
 src/test/regress/expected/select_distinct.out | 397 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql           |   6 +-
 src/test/regress/sql/select_distinct.sql      | 146 +++++++
 5 files changed, 602 insertions(+), 50 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0551ae0512..e25407cf58 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3812,7 +3812,27 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
+	bool	group_unique_input = false;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggregation functions.
+		 */
+		group_unique_input = relation_has_uniquekeys_for(root, input_rel,
+														 groupExprs, false);
+		if (group_unique_input &&
+			groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL)
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -4738,6 +4758,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs, false))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4775,10 +4801,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 3bd184ae29..87fef9f417 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -973,14 +973,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1139,7 +1137,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1162,10 +1160,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1195,12 +1190,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1212,29 +1205,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
@@ -1264,12 +1255,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1282,14 +1271,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..da204734e2 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,400 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP TABLE dist_p;
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p2(d);
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+ANALYZE p;
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+           QUERY PLAN           
+--------------------------------
+ HashAggregate
+   Group Key: p.d
+   ->  Append
+         ->  Seq Scan on p1 p_1
+         ->  Seq Scan on p2 p_2
+(5 rows)
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+              QUERY PLAN               
+---------------------------------------
+ Bitmap Heap Scan on p1 p
+   Recheck Cond: (a = 1)
+   ->  Bitmap Index Scan on p1_a_c_idx
+         Index Cond: (a = 1)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Append
+   ->  Bitmap Heap Scan on p1 p_1
+         Recheck Cond: (c = 1)
+         ->  Bitmap Index Scan on p1_a_c_idx
+               Index Cond: (c = 1)
+   ->  Bitmap Heap Scan on p2 p_2
+         Recheck Cond: (c = 1)
+         ->  Bitmap Index Scan on p2_a_c_idx
+               Index Cond: (c = 1)
+(9 rows)
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+         QUERY PLAN         
+----------------------------
+ Seq Scan on x
+   Filter: (x.* = '(1)'::x)
+(2 rows)
+
+DROP TABLE x;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 044d515507..ad025206fb 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -413,13 +413,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..14bf36d908 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,149 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p2(d);
+
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+
+ANALYZE p;
+
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+DROP TABLE x;
-- 
2.21.0

v10-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchapplication/octet-stream; name=v10-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchDownload

From 8abdee6d65f91c30cdad6809027699d0b76b02e3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 3 May 2020 22:37:46 +0800
Subject: [PATCH v10 1/6] Introduce RelOptInfo->notnullattrs attribute

The notnullattrs is calculated from catalog and run-time query. That
infomation is translated to child relation as well for partitioned
table.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6da0dcd61c..484dab0a1a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1005,6 +1005,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1061,6 +1062,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..95b1b14cd3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 25545029d7..0b2f9d398a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..9e3ebd488a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v10-0006-Join-removal-at-run-time-with-UniqueKey.patchapplication/octet-stream; name=v10-0006-Join-removal-at-run-time-with-UniqueKey.patchDownload

From 04835a4359cc30727e874113ccfccf3ea715aa5f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 7 May 2020 08:36:29 +0800
Subject: [PATCH v10 6/6] Join removal at run-time with UniqueKey.

We add another join removal during build_join_rel to use the
benefits of UniqueKey remove_useless_join.  However this new
strategy can't be an replacement of the current one since we just
knows 2 relation at that time. so it can't handle something like

SELECT a.* FROM a LEFT JOIN (b LEFT JOIN c ON b.c_id = c.id)
ON (a.b_id = b.id);

However it can handle more cases than the current strategy like

SELECT  t1.a
FROM m3 t1
LEFT JOIN (SELECT m1.a FROM m1, m2 WHERE m1.b = m2.a) t2;

As for the implementation, it is a very PoC version. The main
idea is if the join_canbe_removed, we still need a joinrel with
the information of outerrel only. However I have to do many works
to get there.

1. If the innerrel can be removed, we don't need to build pathlist for joinrel,
   we just reuse the pathlist from outerrel. However there are many places where
   use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
   have to change the parent to joinrel.
2. During create plan for some path on RTE_RELATION, it needs to know the
   relation Oid with path->parent->relid. so we have to use the outerrel->relid
   to overwrite the joinrel->relid which is 0 before.
3. Almost same paths as item 2, it usually assert best_path->parent->rtekind ==
   RTE_RELATION; now the path may appear in joinrel, so I used
   outerrel->rtekind to overwrite joinrel->rtekind.
4. I guess there are some dependencies between path->pathtarget and
   rel->reltarget. since we reuse the pathlist of outerrel, so I used the
   outer->reltarget as well. If the join can be removed, I guess the length of
   list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
   rely on the ProjectionPath to reduce the tlist.
---
 src/backend/optimizer/path/joinrels.c |   8 +-
 src/backend/optimizer/util/relnode.c  | 263 +++++++++++++++++++++++++-
 src/include/optimizer/pathnode.h      |   4 +-
 src/test/regress/expected/join.out    |  39 ++++
 src/test/regress/sql/join.sql         |  26 +++
 5 files changed, 333 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index b9163ee8ff..7e393c09f9 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -691,6 +691,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	SpecialJoinInfo sjinfo_data;
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
+	bool	innerrel_removed = false;
 
 	/* We should never try to join two overlapping sets of rels. */
 	Assert(!bms_overlap(rel1->relids, rel2->relids));
@@ -744,7 +745,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	 * goes with this particular joining.
 	 */
 	joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-							 &restrictlist);
+							 &restrictlist, &innerrel_removed);
 
 	/*
 	 * If we've already proven this join is empty, we needn't consider any
@@ -756,9 +757,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 		return joinrel;
 	}
 
+	if (!innerrel_removed)
 	/* Add paths to the join relation. */
-	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-								restrictlist);
+		populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+									restrictlist);
 
 	bms_free(joinrelids);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a203e6f1ff..f488d87b1c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -22,6 +22,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/inherit.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
@@ -73,6 +74,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
 
+static bool join_canbe_removed(PlannerInfo *root,
+							   SpecialJoinInfo *sjinfo,
+							   RelOptInfo *joinrel,
+							   RelOptInfo *innerrel,
+							   List *restrictlist);
 
 /*
  * setup_simple_rel_arrays
@@ -579,7 +585,8 @@ build_join_rel(PlannerInfo *root,
 			   RelOptInfo *outer_rel,
 			   RelOptInfo *inner_rel,
 			   SpecialJoinInfo *sjinfo,
-			   List **restrictlist_ptr)
+			   List **restrictlist_ptr,
+			   bool *innerrel_removed)
 {
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
@@ -718,6 +725,64 @@ build_join_rel(PlannerInfo *root,
 	 */
 	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
+	if (join_canbe_removed(root, sjinfo,
+						   joinrel, inner_rel,
+						   restrictlist))
+	{
+		ListCell *lc;
+
+		joinrel->rows = outer_rel->rows;
+		joinrel->consider_startup = outer_rel->consider_param_startup;
+		joinrel->consider_param_startup = outer_rel->consider_param_startup;
+		joinrel->consider_parallel = outer_rel->consider_parallel;
+
+		/* Rely on the projection path to reduce the tlist. */
+		joinrel->reltarget = outer_rel->reltarget;
+
+		joinrel->direct_lateral_relids = outer_rel->direct_lateral_relids;
+		joinrel->lateral_relids = outer_rel->lateral_relids;
+
+		joinrel->unique_for_rels = outer_rel->unique_for_rels;
+		joinrel->non_unique_for_rels = outer_rel->non_unique_for_rels;
+		joinrel->baserestrictinfo = outer_rel->baserestrictinfo;
+		joinrel->baserestrictcost = outer_rel->baserestrictcost;
+		joinrel->baserestrict_min_security = outer_rel->baserestrict_min_security;
+		joinrel->uniquekeys = outer_rel->uniquekeys;
+		joinrel->consider_partitionwise_join = outer_rel->consider_partitionwise_join;
+		joinrel->top_parent_relids = outer_rel->top_parent_relids;
+
+		/* Some scan path need to know which base relation to scan, it uses the relid
+		 * field, so we have to use the outerrel->relid.
+		 */
+		joinrel->relid = outer_rel->relid;
+
+		/* Almost the same paths as above, it assert the rte_kind is RTE_RELATION, so
+		 * we need to set as same as outerrel as well
+		 */
+		joinrel->rtekind = RTE_RELATION;
+
+		/* Make sure the path->parent point to current joinrel, can't update it in-place. */
+		foreach(lc, outer_rel->pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_path(joinrel, path);
+		}
+
+		foreach(lc, joinrel->partial_pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_partial_path(joinrel, path);
+		}
+		*innerrel_removed = true;
+	}
+	else
+	{
 	/* Store the partition information. */
 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
 								 sjinfo->jointype);
@@ -746,7 +811,7 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) restrictlist) &&
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
-
+	}
 	/* Add the joinrel to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
@@ -759,11 +824,18 @@ build_join_rel(PlannerInfo *root,
 	if (root->join_rel_level)
 	{
 		Assert(root->join_cur_level > 0);
-		Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+	   // Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
 		root->join_rel_level[root->join_cur_level] =
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	/* elog(INFO, "lev-%d Build JoinRel (%s) with %s and %s, inner is removed: %d", */
+	/*	 root->join_cur_level, */
+	/*	 bmsToString(joinrelids), */
+	/*	 bmsToString(outer_rel->relids), */
+	/*	 bmsToString(inner_rel->relids), */
+	/*	 joinrel->removed); */
+
 	return joinrel;
 }
 
@@ -2027,3 +2099,188 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+static bool
+join_canbe_removed(PlannerInfo *root,
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *joinrel,
+				   RelOptInfo *innerrel,
+				   List *restrictlist)
+{
+	Bitmapset	*vars;
+	List	*exprs = NIL;
+	ListCell	*lc;
+	Bitmapset	*tmp;
+	bool	res;
+
+	if (sjinfo->jointype != JOIN_LEFT)
+		return false;
+
+	if (innerrel->uniquekeys == NIL)
+		return false;
+
+	/*
+	 * Check if there is any innerrel's cols can't be removed.
+	 */
+
+	vars = pull_varnos((Node*)joinrel->reltarget->exprs);
+	tmp = bms_intersect(vars, innerrel->relids);
+	if (!bms_is_empty(tmp))
+		return false;
+
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		if (rinfo->can_join)
+		{
+			if (rinfo->mergeopfamilies != NIL)
+			{
+				if (bms_is_subset(rinfo->left_relids, innerrel->relids))
+					exprs = lappend(exprs, get_leftop(rinfo->clause));
+				else if (bms_is_subset(rinfo->right_relids, innerrel->relids))
+					exprs = lappend(exprs, get_rightop(rinfo->clause));
+				else
+					Assert(false);
+			}
+			else
+				/* Not mergeable join clause, we have to keep it */
+				return false;
+		}
+		else
+		{
+			/*
+			 * If the rinfo is not joinable clause, and it is not pushed down to
+			 * baserelation's basicrestrictinfo. so it must be in ON clauses.
+			 * Example: SELECT .. FROM t1 left join t2 on t1.a = 10;
+			 * In this case we can't remove the inner join as well.
+			 */
+			return false;
+		}
+	}
+	res =  relation_has_uniquekeys_for(root, innerrel, exprs, true);
+	return res;
+}
+
+
+size_t
+size_of_path(Path *path)
+{
+	switch(path->type)
+	{
+		case T_Path:
+			return sizeof(Path);
+		case T_IndexPath:
+			return sizeof(IndexPath);
+		case T_BitmapHeapPath:
+			return sizeof(BitmapHeapPath);
+		case T_TidPath:
+			return sizeof(TidPath);
+		case T_SubqueryScanPath:
+			return sizeof(SubqueryScanPath);
+		case T_ForeignPath:
+			return sizeof(ForeignPath);
+		case T_CustomPath:
+			return sizeof(CustomPath);
+
+
+		case T_NestPath:
+			return sizeof(NestPath);
+
+
+		case T_MergePath:
+			return sizeof(MergePath);
+
+
+		case T_HashPath:
+			return sizeof(HashPath);
+
+
+		case T_AppendPath:
+			return sizeof(AppendPath);
+
+
+		case T_MergeAppendPath:
+			return sizeof(MergeAppendPath);
+
+
+		case T_GroupResultPath:
+			return sizeof(GroupResultPath);
+
+
+		case T_MaterialPath:
+			return sizeof(MaterialPath);
+
+
+		case T_UniquePath:
+			return sizeof(UniquePath);
+
+
+		case T_GatherPath:
+			return sizeof(GatherPath);
+
+
+		case T_GatherMergePath:
+			return sizeof(GatherMergePath);
+
+
+		case T_ProjectionPath:
+			return sizeof(ProjectionPath);
+
+
+		case T_ProjectSetPath:
+			return sizeof(ProjectSetPath);
+
+
+		case T_SortPath:
+			return sizeof(SortPath);
+
+
+		case T_IncrementalSortPath:
+			return sizeof(IncrementalSortPath);
+
+
+		case T_GroupPath:
+			return sizeof(GroupPath);
+
+
+		case T_UpperUniquePath:
+			return sizeof(UpperUniquePath);
+
+
+		case T_AggPath:
+			return sizeof(AggPath);
+
+
+		case T_GroupingSetsPath:
+			return sizeof(GroupingSetsPath);
+
+
+		case T_MinMaxAggPath:
+			return sizeof(MinMaxAggPath);
+
+
+		case T_WindowAggPath:
+			return sizeof(WindowAggPath);
+
+
+		case T_SetOpPath:
+			return sizeof(SetOpPath);
+
+
+		case T_RecursiveUnionPath:
+			return sizeof(RecursiveUnionPath);
+
+
+		case T_LockRowsPath:
+			return sizeof(LockRowsPath);
+		case T_ModifyTablePath:
+			return sizeof(ModifyTablePath);
+		case T_LimitPath:
+			return sizeof(LimitPath);
+		default:
+			elog(ERROR, "unrecognized path type: %s",
+				 nodeToString(&path->type));
+			break;
+	}
+	return 0;
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..254961b2b4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -294,7 +294,8 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
 								  RelOptInfo *outer_rel,
 								  RelOptInfo *inner_rel,
 								  SpecialJoinInfo *sjinfo,
-								  List **restrictlist_ptr);
+								  List **restrictlist_ptr,
+								  bool *innerrel_removed);
 extern Relids min_join_parameterization(PlannerInfo *root,
 										Relids joinrelids,
 										RelOptInfo *outer_rel,
@@ -321,4 +322,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *parent_joinrel, List *restrictlist,
 										SpecialJoinInfo *sjinfo, JoinType jointype);
 
+extern size_t size_of_path(Path *path);
 #endif							/* PATHNODE_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8378936eda..b59c9a73ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6283,3 +6283,42 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+        QUERY PLAN        
+--------------------------
+ Seq Scan on public.m3 t1
+   Output: t1.a
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+drop table m1;
+drop table m2;
+drop table m3;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3312542411..317354547d 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2169,3 +2169,29 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+
+drop table m1;
+drop table m2;
+drop table m3;
-- 
2.21.0

#52

Floris Van Nee

florisvannee@Optiver.com

over 5 years ago

In reply to: Andy Fan (#51)

RE: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Andy,

A small thing I found:

+static List *
+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+                                                                                                List *const_exprs,
+                                                                                                List *const_expr_opfamilies,
+                                                                                                Bitmapset *used_varattrs,
+                                                                                                bool *useful,
+                                                                                                bool *multi_nullvals)
…
+             indexpr_item = list_head(unique_index->indexprs);
+             for(c = 0; c < unique_index->ncolumns; c++)
+             {

I believe the for loop must be over unique_index->nkeycolumns, rather than columns. It shouldn’t include the extra non-key columns. This can currently lead to invalid memory accesses as well a few lines later when it does an array access of unique_index->opfamily[c] – this array only has nkeycolumns entries.

-Floris

From: Andy Fan <zhihui.fan1213@gmail.com>
Sent: Sunday 19 July 2020 5:03 AM
To: Dmitry Dolgov <9erthalion6@gmail.com>
Cc: David Rowley <dgrowleyml@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; Tom Lane <tgl@sss.pgh.pa.us>; Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>; rushabh.lathia@gmail.com
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey [External]

Fixed a test case in v10.

--
Best Regards
Andy Fan

#53

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Floris Van Nee (#52)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Floris:

On Thu, Jul 23, 2020 at 3:22 AM Floris Van Nee <florisvannee@optiver.com>
wrote:

Hi Andy,

A small thing I found:

+static List *

+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,

+
List *const_exprs,

+
List *const_expr_opfamilies,

+
Bitmapset *used_varattrs,

+
bool *useful,

+
bool *multi_nullvals)

…

+ indexpr_item = list_head(unique_index->indexprs);

+ for(c = 0; c < unique_index->ncolumns; c++)

+ {

I believe the for loop must be over unique_index->nkeycolumns, rather than
columns. It shouldn’t include the extra non-key columns. This can currently
lead to invalid memory accesses as well a few lines later when it does an
array access of unique_index->opfamily[c] – this array only has nkeycolumns
entries.

You are correct, I would include this in the next version patch, Thank you
for this checking!

--
Andy Fan
Best Regards

*From:* Andy Fan <zhihui.fan1213@gmail.com>
*Sent:* Sunday 19 July 2020 5:03 AM
*To:* Dmitry Dolgov <9erthalion6@gmail.com>
*Cc:* David Rowley <dgrowleyml@gmail.com>; PostgreSQL Hackers <
pgsql-hackers@lists.postgresql.org>; Tom Lane <tgl@sss.pgh.pa.us>;
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>; rushabh.lathia@gmail.com
*Subject:* Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
[External]

Fixed a test case in v10.

--

Best Regards

Andy Fan

--
Best Regards
Andy Fan

#54

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Andy Fan (#53)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Tue, Aug 04, 2020 at 06:59:50AM +0800, Andy Fan wrote:

You are correct, I would include this in the next version patch, Thank you
for this checking!

Regression tests are failing with this patch set applied. The CF bot
says so, and I can reproduce that locally as well. Could you look at
that please? I have switched the patch to "waiting on author".
--
Michael

#55

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Michael Paquier (#54)

6 attachment(s)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Mon, Sep 7, 2020 at 3:22 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 04, 2020 at 06:59:50AM +0800, Andy Fan wrote:

You are correct, I would include this in the next version patch, Thank

you

for this checking!

Regression tests are failing with this patch set applied. The CF bot
says so, and I can reproduce that locally as well. Could you look at
that please? I have switched the patch to "waiting on author".
--
Michael

Thank you Michael for checking it, I can reproduce the same locally after
rebasing to the latest master. The attached v11 has fixed it and includes
the fix Floris found.

The status of this patch is we are still in discussion about which data
type should
UniqueKey->expr use. Both David [1]/messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com and I [2]/messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com shared some thinking about
EquivalenceClasses, but neither of us have decided on it. So I still didn't
change
anything about that now. I can change it once we have decided on it.

[1]: /messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com
/messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com

[2]: /messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com
/messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com

--
Best Regards
Andy Fan

Attachments:

v11-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-str.patchapplication/octet-stream; name=v11-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-str.patchDownload

From 271f5a918bec2982ee3bcbc3d7a86ce5599b6d73 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:50:52 +0800
Subject: [PATCH v11 2/6] Introduce UniqueKey attributes on RelOptInfo struct.

UniqueKey is a set of exprs on RelOptInfo which represents the exprs
will be unique on the given RelOptInfo. You can see README.uniquekey
for more information.
---
 src/backend/nodes/copyfuncs.c               |   13 +
 src/backend/nodes/list.c                    |   31 +
 src/backend/nodes/makefuncs.c               |   13 +
 src/backend/nodes/outfuncs.c                |   11 +
 src/backend/nodes/readfuncs.c               |   10 +
 src/backend/optimizer/path/Makefile         |    3 +-
 src/backend/optimizer/path/README.uniquekey |  131 +++
 src/backend/optimizer/path/allpaths.c       |   10 +
 src/backend/optimizer/path/joinpath.c       |    9 +-
 src/backend/optimizer/path/joinrels.c       |    2 +
 src/backend/optimizer/path/pathkeys.c       |    3 +-
 src/backend/optimizer/path/uniquekeys.c     | 1131 +++++++++++++++++++
 src/backend/optimizer/plan/planner.c        |   13 +-
 src/backend/optimizer/prep/prepunion.c      |    2 +
 src/backend/optimizer/util/appendinfo.c     |   44 +
 src/backend/optimizer/util/inherit.c        |   16 +-
 src/include/nodes/makefuncs.h               |    3 +
 src/include/nodes/nodes.h                   |    1 +
 src/include/nodes/pathnodes.h               |   29 +-
 src/include/nodes/pg_list.h                 |    2 +
 src/include/optimizer/appendinfo.h          |    3 +
 src/include/optimizer/optimizer.h           |    2 +
 src/include/optimizer/paths.h               |   43 +
 23 files changed, 1502 insertions(+), 23 deletions(-)
 create mode 100644 src/backend/optimizer/path/README.uniquekey
 create mode 100644 src/backend/optimizer/path/uniquekeys.c

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 0409a40b82..451ef1b766 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2273,6 +2273,16 @@ _copyPathKey(const PathKey *from)
 	return newnode;
 }
 
+static UniqueKey *
+_copyUniqueKey(const UniqueKey *from)
+{
+	UniqueKey	*newnode = makeNode(UniqueKey);
+
+	COPY_NODE_FIELD(exprs);
+	COPY_SCALAR_FIELD(multi_nullvals);
+
+	return newnode;
+}
 /*
  * _copyRestrictInfo
  */
@@ -5151,6 +5161,9 @@ copyObjectImpl(const void *from)
 		case T_PathKey:
 			retval = _copyPathKey(from);
 			break;
+		case T_UniqueKey:
+			retval = _copyUniqueKey(from);
+			break;
 		case T_RestrictInfo:
 			retval = _copyRestrictInfo(from);
 			break;
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 80fa8c84e4..a7a99b70f2 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -687,6 +687,37 @@ list_member_oid(const List *list, Oid datum)
 	return false;
 }
 
+/*
+ * return true iff every entry in "members" list is also present
+ * in the "target" list.
+ */
+bool
+list_is_subset(const List *members, const List *target)
+{
+	const ListCell	*lc1, *lc2;
+
+	Assert(IsPointerList(members));
+	Assert(IsPointerList(target));
+	check_list_invariants(members);
+	check_list_invariants(target);
+
+	foreach(lc1, members)
+	{
+		bool found = false;
+		foreach(lc2, target)
+		{
+			if (equal(lfirst(lc1), lfirst(lc2)))
+			{
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			return false;
+	}
+	return true;
+}
+
 /*
  * Delete the n'th cell (counting from 0) in list.
  *
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
index 49de285f01..646cf7c9a1 100644
--- a/src/backend/nodes/makefuncs.c
+++ b/src/backend/nodes/makefuncs.c
@@ -814,3 +814,16 @@ makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols)
 	v->va_cols = va_cols;
 	return v;
 }
+
+
+/*
+ * makeUniqueKey
+ */
+UniqueKey*
+makeUniqueKey(List *exprs, bool multi_nullvals)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	ukey->exprs = exprs;
+	ukey->multi_nullvals = multi_nullvals;
+	return ukey;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index e2f177515d..c3a9632992 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2428,6 +2428,14 @@ _outPathKey(StringInfo str, const PathKey *node)
 	WRITE_BOOL_FIELD(pk_nulls_first);
 }
 
+static void
+_outUniqueKey(StringInfo str, const UniqueKey *node)
+{
+	WRITE_NODE_TYPE("UNIQUEKEY");
+	WRITE_NODE_FIELD(exprs);
+	WRITE_BOOL_FIELD(multi_nullvals);
+}
+
 static void
 _outPathTarget(StringInfo str, const PathTarget *node)
 {
@@ -4127,6 +4135,9 @@ outNode(StringInfo str, const void *obj)
 			case T_PathKey:
 				_outPathKey(str, obj);
 				break;
+			case T_UniqueKey:
+				_outUniqueKey(str, obj);
+				break;
 			case T_PathTarget:
 				_outPathTarget(str, obj);
 				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 42050ab719..3a18571d0c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -452,6 +452,14 @@ _readSetOperationStmt(void)
 	READ_DONE();
 }
 
+static UniqueKey *
+_readUniqueKey(void)
+{
+	READ_LOCALS(UniqueKey);
+	READ_NODE_FIELD(exprs);
+	READ_BOOL_FIELD(multi_nullvals);
+	READ_DONE();
+}
 
 /*
  *	Stuff from primnodes.h.
@@ -2656,6 +2664,8 @@ parseNodeString(void)
 		return_value = _readCommonTableExpr();
 	else if (MATCH("SETOPERATIONSTMT", 16))
 		return_value = _readSetOperationStmt();
+	else if (MATCH("UNIQUEKEY", 9))
+		return_value = _readUniqueKey();
 	else if (MATCH("ALIAS", 5))
 		return_value = _readAlias();
 	else if (MATCH("RANGEVAR", 8))
diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..7b9820c25f 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekeys.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/README.uniquekey b/src/backend/optimizer/path/README.uniquekey
new file mode 100644
index 0000000000..5eac761995
--- /dev/null
+++ b/src/backend/optimizer/path/README.uniquekey
@@ -0,0 +1,131 @@
+1. What is UniqueKey?
+We can think UniqueKey is a set of exprs for a RelOptInfo, which we are insure
+that doesn't yields same result among all the rows. The simplest UniqueKey
+format is primary key.
+
+However we define the UnqiueKey as below.
+
+typedef struct UniqueKey
+{
+        NodeTag	type;
+        List	*exprs;
+        bool	multi_nullvals;
+} UniqueKey;
+
+exprs is a list of exprs which is unique on current RelOptInfo. exprs = NIL
+is a special case of UniqueKey, which means there is only one row in that
+relation.it has a stronger semantic than others. like SELECT uk FROM t; uk is
+normal unique key and may have different values. SELECT colx FROM t WHERE uk =
+const.  colx is unique AND we have only 1 value. This field can used for
+innerrel_is_unique. this logic is handled specially in add_uniquekey_for_onerow
+function.
+
+multi_nullvals: true means multi null values may exist in these exprs, so the
+uniqueness is not guaranteed in this case. This field is necessary for
+remove_useless_join & reduce_unique_semijoins where we don't mind these
+duplicated NULL values. It is set to true for 2 cases. One is a unique key
+from a unique index but the related column is nullable. The other one is for
+outer join. see populate_joinrel_uniquekeys for detail.
+
+
+The UniqueKey can be used at the following cases at least:
+1. remove_useless_joins.
+2. reduce_semianti_joins
+3. remove distinct node if distinct clause is unique.
+4. remove aggnode if group by clause is unique.
+5. Index Skip Scan (WIP)
+6. Aggregation Push Down without 2 phase aggregation if the join can't
+   duplicated the aggregated rows. (work in progress feature)
+
+2. How is it maintained?
+
+We have a set of populate_xxx_unqiuekeys functions to maintain the uniquekey on
+various cases. xxx includes baserel, joinrel, partitionedrel, distinctrel,
+groupedrel, unionrel. and we also need to convert the uniquekey from subquery
+to outer relation, which is what convert_subquery_uniquekeys does.
+
+1. The first part is about baserel. We handled 3 cases. suppose we have Unique
+Index on (a, b).
+
+1. SELECT a, b FROM t.  UniqueKey (a, b)
+2. SELECT a FROM t WHERE b = 1;  UniqueKey (a)
+3. SELECT .. FROM t WHERE a = 1 AND b = 1;  UniqueKey (NIL).  onerow case, every
+   column is Unique.
+
+2. The next part is joinrel, this part is most error-prone, we simplified the rules
+like below:
+1. If the relation's UniqueKey can't be duplicated after join,  then is will be
+   still valid for the join rel. The function we used here is
+   innerrel_keeps_unique. The basic idea is innerrel.any_col = outer.uk.
+
+2. If the UnqiueKey can't keep valid via the rule 1, the combination of the
+   UniqueKey from both sides are valid for sure.  We can prove this as: if the
+   unique exprs from rel1 is duplicated by rel2, the duplicated rows must
+   contains different unique exprs from rel2.
+
+More considerations about onerow:
+1. If relation with one row and it can't be duplicated, it is still possible
+   contains mulit_nullvas after outer join.
+2. If the either UniqueKey can be duplicated after join, the can get one row
+   only when both side is one row AND there is no outer join.
+3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
+   row UniqueKey to normal unique key since we don't store exprs for one-row
+   relation. get_exprs_from_uniquekeys will be used here.
+
+
+More considerations about multi_nullvals after join:
+1. If the original UnqiueKey has multi_nullvals, the final UniqueKey will have
+   mulit_nullvals in any case.
+2. If a unique key doesn't allow mulit_nullvals, after some outer join, it
+   allows some outer join.
+
+
+3. When we comes to subquery, we need to convert_subquery_unqiuekeys just like
+convert_subquery_pathkeys.  Only the UniqueKey insides subquery is referenced as
+a Var in outer relation will be reused. The relationship between the outerrel.Var
+and subquery.exprs is built with outerel->subroot->processed_tlist.
+
+
+4. As for the SRF functions, it will break the uniqueness of uniquekey, However it
+is handled in adjust_paths_for_srfs, which happens after the query_planner. so
+we will maintain the UniqueKey until there and reset it to NIL at that
+places. This can't help on distinct/group by elimination cases but probably help
+in some other cases, like reduce_unqiue_semijoins/remove_useless_joins and it is
+semantic correctly.
+
+
+5. As for inherit table, we first main the UnqiueKey on childrel as well. But for
+partitioned table we need to maintain 2 different kinds of
+UnqiueKey. 1). UniqueKey on the parent relation 2). UniqueKey on child
+relation for partition wise query.
+
+Example:
+CREATE TABLE p (a int not null, b int not null) partition by list (a);
+CREATE TABLE p0 partition of p for values in (1);
+CREATE TABLE p1 partition of p for values in (2);
+
+create unique index p0_b on p0(b);
+create unique index p1_b on p1(b);
+
+Now b is only unique on partition level, so the distinct can't be removed on
+the following cases. SELECT DISTINCT b FROM p;
+
+Another example is SELECT DISTINCT a, b FROM p WHERE a = 1; Since only one
+partition is chosen, the UniqueKey on child relation is same as the UniqueKey on
+parent relation.
+
+Another usage of UniqueKey on partition level is it be helpful for
+partition-wise join.
+
+As for the UniqueKey on parent table level, it comes with 2 different ways,
+1). the UniqueKey is also derived in UniqueKey index, but the index must be same
+in all the related children relations and the unique index must contains
+Partition Key in it. Example:
+
+CREATE UNIQUE INDEX p_ab ON p(a, b);  -- where a is the partition key.
+
+-- Query
+SELECT a, b FROM p; the (a, b) is a UniqueKey of p.
+
+2). If the parent relation has only one childrel, the UniqueKey on childrel is
+ the UniqueKey on parent as well.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 754f6d64f6..66d246fa1a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -579,6 +579,12 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	/*
+	 * Now that we've marked which partial indexes are suitable, we can now
+	 * build the relation's unique keys.
+	 */
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
@@ -1314,6 +1320,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add paths to the append relation. */
 	add_paths_to_append_rel(root, rel, live_childrels);
+	if (IS_PARTITIONED_REL(rel))
+		populate_partitionedrel_uniquekeys(root, rel, live_childrels);
 }
 
 
@@ -2387,6 +2395,8 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 										  pathkeys, required_outer));
 	}
 
+	convert_subquery_uniquekeys(root, rel, sub_final_rel);
+
 	/* If outer rel allows parallelism, do same for partial paths. */
 	if (rel->consider_parallel && bms_is_empty(required_outer))
 	{
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index db54a6ba2e..ef0fd2fb0b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -71,13 +71,6 @@ static void consider_parallel_mergejoin(PlannerInfo *root,
 static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
 								 JoinType jointype, JoinPathExtraData *extra);
-static List *select_mergejoin_clauses(PlannerInfo *root,
-									  RelOptInfo *joinrel,
-									  RelOptInfo *outerrel,
-									  RelOptInfo *innerrel,
-									  List *restrictlist,
-									  JoinType jointype,
-									  bool *mergejoin_allowed);
 static void generate_mergejoin_paths(PlannerInfo *root,
 									 RelOptInfo *joinrel,
 									 RelOptInfo *innerrel,
@@ -1927,7 +1920,7 @@ hash_inner_and_outer(PlannerInfo *root,
  * if it is mergejoinable and involves vars from the two sub-relations
  * currently of interest.
  */
-static List *
+List *
 select_mergejoin_clauses(PlannerInfo *root,
 						 RelOptInfo *joinrel,
 						 RelOptInfo *outerrel,
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 2d343cd293..b9163ee8ff 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -924,6 +924,8 @@ populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 
 	/* Apply partitionwise join technique, if possible. */
 	try_partitionwise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+
+	populate_joinrel_uniquekeys(root, joinrel, rel1, rel2, restrictlist, sjinfo->jointype);
 }
 
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ce9bf87e9b..7e596d4194 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -33,7 +33,6 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
 static bool matches_boolean_partition_clause(RestrictInfo *rinfo,
 											 RelOptInfo *partrel,
 											 int partkeycol);
-static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);
 
 
@@ -1035,7 +1034,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
  * We need this to ensure that we don't return pathkeys describing values
  * that are unavailable above the level of the subquery scan.
  */
-static Var *
+Var *
 find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/path/uniquekeys.c b/src/backend/optimizer/path/uniquekeys.c
new file mode 100644
index 0000000000..c7ad76d28f
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -0,0 +1,1131 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/appendinfo.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/tlist.h"
+#include "rewrite/rewriteManip.h"
+
+
+/*
+ * This struct is used to help populate_joinrel_uniquekeys.
+ *
+ * added_to_joinrel is true if a uniquekey (from outerrel or innerrel)
+ * has been added to joinrel.
+ * useful is true if the exprs of the uniquekey still appears in joinrel.
+ */
+typedef struct UniqueKeyContextData
+{
+	UniqueKey	*uniquekey;
+	bool	added_to_joinrel;
+	bool	useful;
+} *UniqueKeyContext;
+
+static List *initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel);
+static bool innerrel_keeps_unique(PlannerInfo *root,
+								  RelOptInfo *outerrel,
+								  RelOptInfo *innerrel,
+								  List *restrictlist,
+								  bool reverse);
+
+static List *get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+										List *const_exprs,
+										List *const_expr_opfamilies,
+										Bitmapset *used_varattrs,
+										bool *useful,
+										bool *multi_nullvals);
+static List *get_exprs_from_uniquekey(RelOptInfo *joinrel,
+									  RelOptInfo *rel1,
+									  UniqueKey *ukey);
+static void add_uniquekey_for_onerow(RelOptInfo *rel);
+static bool add_combined_uniquekey(RelOptInfo *joinrel,
+								   RelOptInfo *outer_rel,
+								   RelOptInfo *inner_rel,
+								   UniqueKey *outer_ukey,
+								   UniqueKey *inner_ukey,
+								   JoinType jointype);
+
+/* Used for unique indexes checking for partitioned table */
+static bool index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind);
+static IndexOptInfo *simple_copy_indexinfo_to_parent(PlannerInfo *root,
+													 RelOptInfo *parentrel,
+													 IndexOptInfo *from);
+static bool simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2);
+static void adjust_partition_unique_indexlist(PlannerInfo *root,
+											  RelOptInfo *parentrel,
+											  RelOptInfo *childrel,
+											  List **global_unique_index);
+
+/* Helper function for grouped relation and distinct relation. */
+static void add_uniquekey_from_sortgroups(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *sortgroups);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * and baserestrictinfo
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *baserel,
+							List *indexlist)
+{
+	ListCell *lc;
+	List	*matched_uniq_indexes = NIL;
+
+	/* Attrs appears in rel->reltarget->exprs. */
+	Bitmapset *used_attrs = NULL;
+
+	List	*const_exprs = NIL;
+	List	*expr_opfamilies = NIL;
+
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	foreach(lc, indexlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+		matched_uniq_indexes = lappend(matched_uniq_indexes, ind);
+	}
+
+	if (matched_uniq_indexes  == NIL)
+		return;
+
+	/* Check which attrs is used in baserel->reltarget */
+	pull_varattnos((Node *)baserel->reltarget->exprs, baserel->relid, &used_attrs);
+
+	/* Check which attrno is used at a mergeable const filter */
+	foreach(lc, baserel->baserestrictinfo)
+	{
+		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids))
+		{
+			const_exprs = lappend(const_exprs, get_rightop(rinfo->clause));
+		}
+		else if (bms_is_empty(rinfo->right_relids))
+		{
+			const_exprs = lappend(const_exprs, get_leftop(rinfo->clause));
+		}
+		else
+			continue;
+
+		expr_opfamilies = lappend(expr_opfamilies, rinfo->mergeopfamilies);
+	}
+
+	foreach(lc, matched_uniq_indexes)
+	{
+		bool	multi_nullvals, useful;
+		List	*exprs = get_exprs_from_uniqueindex(lfirst_node(IndexOptInfo, lc),
+													const_exprs,
+													expr_opfamilies,
+													used_attrs,
+													&useful,
+													&multi_nullvals);
+		if (useful)
+		{
+			if (exprs == NIL)
+			{
+				/* All the columns in Unique Index matched with a restrictinfo */
+				add_uniquekey_for_onerow(baserel);
+				return;
+			}
+			baserel->uniquekeys = lappend(baserel->uniquekeys,
+										  makeUniqueKey(exprs, multi_nullvals));
+		}
+	}
+}
+
+
+/*
+ * populate_partitionedrel_uniquekeys
+ * The UniqueKey on partitionrel comes from 2 cases:
+ * 1). Only one partition is involved in this query, the unique key can be
+ * copied to parent rel from childrel.
+ * 2). There are some unique index which includes partition key and exists
+ * in all the related partitions.
+ * We never mind rule 2 if we hit rule 1.
+ */
+
+void
+populate_partitionedrel_uniquekeys(PlannerInfo *root,
+								   RelOptInfo *rel,
+								   List *childrels)
+{
+	ListCell	*lc;
+	List	*global_uniq_indexlist = NIL;
+	RelOptInfo *childrel;
+	bool is_first = true;
+
+	Assert(IS_PARTITIONED_REL(rel));
+
+	if (childrels == NIL)
+		return;
+
+	/*
+	 * If there is only one partition used in this query, the UniqueKey in childrel is
+	 * still valid in parent level, but we need convert the format from child expr to
+	 * parent expr.
+	 */
+	if (list_length(childrels) == 1)
+	{
+		/* Check for Rule 1 */
+		RelOptInfo *childrel = linitial_node(RelOptInfo, childrels);
+		ListCell	*lc;
+		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+		if (relation_is_onerow(childrel))
+		{
+			add_uniquekey_for_onerow(rel);
+			return;
+		}
+
+		foreach(lc, childrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			AppendRelInfo *appinfo = find_appinfo_by_child(root, childrel->relid);
+			List *parent_exprs = NIL;
+			bool can_reuse = true;
+			ListCell	*lc2;
+			foreach(lc2, ukey->exprs)
+			{
+				Var *var = (Var *)lfirst(lc2);
+				/*
+				 * If the expr comes from a expression, it is hard to build the expression
+				 * in parent so ignore that case for now.
+				 */
+				if(!IsA(var, Var))
+				{
+					can_reuse = false;
+					break;
+				}
+				/* Convert it to parent var */
+				parent_exprs = lappend(parent_exprs, find_parent_var(appinfo, var));
+			}
+			if (can_reuse)
+				rel->uniquekeys = lappend(rel->uniquekeys,
+										  makeUniqueKey(parent_exprs,
+														ukey->multi_nullvals));
+		}
+	}
+	else
+	{
+		/* Check for rule 2 */
+		childrel = linitial_node(RelOptInfo, childrels);
+		foreach(lc, childrel->indexlist)
+		{
+			IndexOptInfo *ind = lfirst(lc);
+			IndexOptInfo *modified_index;
+			if (!ind->unique || !ind->immediate ||
+				(ind->indpred != NIL && !ind->predOK))
+				continue;
+
+			/*
+			 * During simple_copy_indexinfo_to_parent, we need to convert var from
+			 * child var to parent var, index on expression is too complex to handle.
+			 * so ignore it for now.
+			 */
+			if (ind->indexprs != NIL)
+				continue;
+
+			modified_index = simple_copy_indexinfo_to_parent(root, rel, ind);
+			/*
+			 * If the unique index doesn't contain partkey, then it is unique
+			 * on this partition only, so it is useless for us.
+			 */
+			if (!index_constains_partkey(rel, modified_index))
+				continue;
+
+			global_uniq_indexlist = lappend(global_uniq_indexlist,  modified_index);
+		}
+
+		if (global_uniq_indexlist != NIL)
+		{
+			foreach(lc, childrels)
+			{
+				RelOptInfo *child = lfirst(lc);
+				if (is_first)
+				{
+					is_first = false;
+					continue;
+				}
+				adjust_partition_unique_indexlist(root, rel, child, &global_uniq_indexlist);
+			}
+			/* Now we have a list of unique index which are exactly same on all childrels,
+			 * Set the UniqueKey just like it is non-partition table
+			 */
+			populate_baserel_uniquekeys(root, rel, global_uniq_indexlist);
+		}
+	}
+}
+
+
+/*
+ * populate_distinctrel_uniquekeys
+ */
+void
+populate_distinctrel_uniquekeys(PlannerInfo *root,
+								RelOptInfo *inputrel,
+								RelOptInfo *distinctrel)
+{
+	/* The unique key before the distinct is still valid. */
+	distinctrel->uniquekeys = list_copy(inputrel->uniquekeys);
+	add_uniquekey_from_sortgroups(root, distinctrel, root->parse->distinctClause);
+}
+
+/*
+ * populate_grouprel_uniquekeys
+ */
+void
+populate_grouprel_uniquekeys(PlannerInfo *root,
+							 RelOptInfo *grouprel,
+							 RelOptInfo *inputrel)
+
+{
+	Query *parse = root->parse;
+	bool input_ukey_added = false;
+	ListCell *lc;
+
+	if (relation_is_onerow(inputrel))
+	{
+		add_uniquekey_for_onerow(grouprel);
+		return;
+	}
+	if (parse->groupingSets)
+		return;
+
+	/* A Normal group by without grouping set. */
+	if (parse->groupClause)
+	{
+		/*
+		 * Current even the groupby clause is Unique already, but if query has aggref
+		 * We have to create grouprel still. To keep the UnqiueKey short, we will check
+		 * the UniqueKey of input_rel still valid, if so we reuse it.
+		 */
+		foreach(lc, inputrel->uniquekeys)
+		{
+			UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(ukey->exprs, grouprel->reltarget->exprs))
+			{
+				grouprel->uniquekeys = lappend(grouprel->uniquekeys,
+											   ukey);
+				input_ukey_added = true;
+			}
+		}
+		if (!input_ukey_added)
+			/*
+			 * group by clause must be a super-set of grouprel->reltarget->exprs except the
+			 * aggregation expr, so if such exprs is unique already, no bother to generate
+			 * new uniquekey for group by exprs.
+			 */
+			add_uniquekey_from_sortgroups(root,
+										  grouprel,
+										  root->parse->groupClause);
+	}
+	else
+		/* It has aggregation but without a group by, so only one row returned */
+		add_uniquekey_for_onerow(grouprel);
+}
+
+/*
+ * simple_copy_uniquekeys
+ * Using a function for the one-line code makes us easy to check where we simply
+ * copied the uniquekey.
+ */
+void
+simple_copy_uniquekeys(RelOptInfo *oldrel,
+					   RelOptInfo *newrel)
+{
+	newrel->uniquekeys = oldrel->uniquekeys;
+}
+
+/*
+ *  populate_unionrel_uniquekeys
+ */
+void
+populate_unionrel_uniquekeys(PlannerInfo *root,
+							  RelOptInfo *unionrel)
+{
+	ListCell	*lc;
+	List	*exprs = NIL;
+
+	Assert(unionrel->uniquekeys == NIL);
+
+	foreach(lc, unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+	}
+
+	if (exprs == NIL)
+		/* SQL: select union select; is valid, we need to handle it here. */
+		add_uniquekey_for_onerow(unionrel);
+	else
+		unionrel->uniquekeys = lappend(unionrel->uniquekeys,
+									   makeUniqueKey(exprs,false));
+
+}
+
+/*
+ * populate_joinrel_uniquekeys
+ *
+ * populate uniquekeys for joinrel. We will check each relation to see if its
+ * UniqueKey is still valid via innerrel_keeps_unique, if so, we add it to
+ * joinrel.  The multi_nullvals field will be changed to true for some outer
+ * join cases and one-row UniqueKey needs to be converted to normal UniqueKey
+ * for the same case as well.
+ * For the uniquekey in either baserel which can't be unique after join, we still
+ * check to see if combination of UniqueKeys from both side is still useful for us.
+ * if yes, we add it to joinrel as well.
+ */
+void
+populate_joinrel_uniquekeys(PlannerInfo *root, RelOptInfo *joinrel,
+							RelOptInfo *outerrel, RelOptInfo *innerrel,
+							List *restrictlist, JoinType jointype)
+{
+	ListCell *lc, *lc2;
+	List	*clause_list = NIL;
+	List	*outerrel_ukey_ctx;
+	List	*innerrel_ukey_ctx;
+	bool	inner_onerow, outer_onerow;
+	bool	mergejoin_allowed;
+
+	/* Care about the outerrel relation only for SEMI/ANTI join */
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
+	{
+		foreach(lc, outerrel->uniquekeys)
+		{
+			UniqueKey	*uniquekey = lfirst_node(UniqueKey, lc);
+			if (list_is_subset(uniquekey->exprs, joinrel->reltarget->exprs))
+				joinrel->uniquekeys = lappend(joinrel->uniquekeys, uniquekey);
+		}
+		return;
+	}
+
+	Assert(jointype == JOIN_LEFT || jointype == JOIN_FULL || jointype == JOIN_INNER);
+
+	/* Fast path */
+	if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+		return;
+
+	inner_onerow = relation_is_onerow(innerrel);
+	outer_onerow = relation_is_onerow(outerrel);
+
+	outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(outerrel);
+	innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(innerrel);
+
+	clause_list = select_mergejoin_clauses(root, joinrel, outerrel, innerrel,
+										   restrictlist, jointype,
+										   &mergejoin_allowed);
+
+	if (innerrel_keeps_unique(root, innerrel, outerrel, clause_list, true /* reverse */))
+	{
+		bool outer_impact = jointype == JOIN_FULL;
+		foreach(lc, outerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			/* Outer relation has one row, and the unique key is not duplicated after join,
+			 * the joinrel will still has one row unless the jointype == JOIN_FULL.
+			 */
+			if (outer_onerow && !outer_impact)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (outer_onerow)
+			{
+				/*
+				 * The onerow outerrel becomes multi rows and multi_nullvals
+				 * will be changed to true. We also need to set the exprs correctly since it
+				 * can't be NIL any more.
+				 */
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, outerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2), true));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && outer_impact)
+					/* Change multi_nullvals to true due to the full join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs, true));
+				else
+					/* Just reuse it */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+	{
+		bool outer_impact = jointype == JOIN_FULL || jointype == JOIN_LEFT;;
+
+		foreach(lc, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+
+			if (!list_is_subset(ctx->uniquekey->exprs, joinrel->reltarget->exprs))
+			{
+				ctx->useful = false;
+				continue;
+			}
+
+			if (inner_onerow &&  !outer_impact)
+			{
+				add_uniquekey_for_onerow(joinrel);
+				return;
+			}
+			else if (inner_onerow)
+			{
+				ListCell *lc2;
+				foreach(lc2, get_exprs_from_uniquekey(joinrel, innerrel, NULL))
+				{
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(lfirst(lc2), true));
+				}
+			}
+			else
+			{
+				if (!ctx->uniquekey->multi_nullvals && outer_impact)
+					/* Need to change multi_nullvals to true due to the outer join. */
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  makeUniqueKey(ctx->uniquekey->exprs,
+																true));
+				else
+					joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+												  ctx->uniquekey);
+
+			}
+			ctx->added_to_joinrel = true;
+		}
+	}
+
+	/*
+	 * The combination of the UniqueKey from both sides is unique as well regardless
+	 * of join type, but no bother to add it if its subset has been added to joinrel
+	 * already or it is not useful for the joinrel.
+	 */
+	foreach(lc, outerrel_ukey_ctx)
+	{
+		UniqueKeyContext ctx1 = (UniqueKeyContext) lfirst(lc);
+		if (ctx1->added_to_joinrel || !ctx1->useful)
+			continue;
+		foreach(lc2, innerrel_ukey_ctx)
+		{
+			UniqueKeyContext ctx2 = (UniqueKeyContext) lfirst(lc2);
+			if (ctx2->added_to_joinrel || !ctx2->useful)
+				continue;
+			if (add_combined_uniquekey(joinrel, outerrel, innerrel,
+									   ctx1->uniquekey, ctx2->uniquekey,
+									   jointype))
+				/* If we set a onerow UniqueKey to joinrel, we don't need other. */
+				return;
+		}
+	}
+}
+
+
+/*
+ * convert_subquery_uniquekeys
+ *
+ * Covert the UniqueKey in subquery to outer relation.
+ */
+void convert_subquery_uniquekeys(PlannerInfo *root,
+								 RelOptInfo *currel,
+								 RelOptInfo *sub_final_rel)
+{
+	ListCell	*lc;
+
+	if (sub_final_rel->uniquekeys == NIL)
+		return;
+
+	if (relation_is_onerow(sub_final_rel))
+	{
+		add_uniquekey_for_onerow(currel);
+		return;
+	}
+
+	Assert(currel->subroot != NULL);
+
+	foreach(lc, sub_final_rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		ListCell	*lc;
+		List	*exprs = NIL;
+		bool	ukey_useful = true;
+
+		/* One row case is handled above */
+		Assert(ukey->exprs != NIL);
+		foreach(lc, ukey->exprs)
+		{
+			Var *var;
+			TargetEntry *tle = tlist_member(lfirst(lc),
+											currel->subroot->processed_tlist);
+			if (tle == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			var = find_var_for_subquery_tle(currel, tle);
+			if (var == NULL)
+			{
+				ukey_useful = false;
+				break;
+			}
+			exprs = lappend(exprs, var);
+		}
+
+		if (ukey_useful)
+			currel->uniquekeys = lappend(currel->uniquekeys,
+										 makeUniqueKey(exprs,
+													   ukey->multi_nullvals));
+
+	}
+}
+
+/*
+ * innerrel_keeps_unique
+ *
+ * Check if Unique key of the innerrel is valid after join. innerrel's UniqueKey
+ * will be still valid if innerrel's any-column mergeop outrerel's uniquekey
+ * exists in clause_list.
+ *
+ * Note: the clause_list must be a list of mergeable restrictinfo already.
+ */
+static bool
+innerrel_keeps_unique(PlannerInfo *root,
+					  RelOptInfo *outerrel,
+					  RelOptInfo *innerrel,
+					  List *clause_list,
+					  bool reverse)
+{
+	ListCell	*lc, *lc2, *lc3;
+
+	if (outerrel->uniquekeys == NIL || innerrel->uniquekeys == NIL)
+		return false;
+
+	/* Check if there is outerrel's uniquekey in mergeable clause. */
+	foreach(lc, outerrel->uniquekeys)
+	{
+		List	*outer_uq_exprs = lfirst_node(UniqueKey, lc)->exprs;
+		bool clauselist_matchs_all_exprs = true;
+		foreach(lc2, outer_uq_exprs)
+		{
+			Node *outer_uq_expr = lfirst(lc2);
+			bool find_uq_expr_in_clauselist = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *outer_expr;
+				if (reverse)
+					outer_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause) : get_leftop(rinfo->clause);
+				else
+					outer_expr = rinfo->outer_is_left ? get_leftop(rinfo->clause) : get_rightop(rinfo->clause);
+				if (equal(outer_expr, outer_uq_expr))
+				{
+					find_uq_expr_in_clauselist = true;
+					break;
+				}
+			}
+			if (!find_uq_expr_in_clauselist)
+			{
+				/* No need to check the next exprs in the current uniquekey */
+				clauselist_matchs_all_exprs = false;
+				break;
+			}
+		}
+
+		if (clauselist_matchs_all_exprs)
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * relation_is_onerow
+ * Check if it is a one-row relation by checking UniqueKey.
+ */
+bool
+relation_is_onerow(RelOptInfo *rel)
+{
+	UniqueKey *ukey;
+	if (rel->uniquekeys == NIL)
+		return false;
+	ukey = linitial_node(UniqueKey, rel->uniquekeys);
+	return ukey->exprs == NIL && list_length(rel->uniquekeys) == 1;
+}
+
+/*
+ * relation_has_uniquekeys_for
+ *		Returns true if we have proofs that 'rel' cannot return multiple rows with
+ *		the same values in each of 'exprs'.  Otherwise returns false.
+ */
+bool
+relation_has_uniquekeys_for(PlannerInfo *root, RelOptInfo *rel,
+							List *exprs, bool allow_multinulls)
+{
+	ListCell *lc;
+
+	/*
+	 * For UniqueKey->onerow case, the uniquekey->exprs is empty as well
+	 * so we can't rely on list_is_subset to handle this special cases
+	 */
+	if (exprs == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (ukey->multi_nullvals && !allow_multinulls)
+			continue;
+		if (list_is_subset(ukey->exprs, exprs))
+			return true;
+	}
+	return false;
+}
+
+
+/*
+ * get_exprs_from_uniqueindex
+ *
+ * Return a list of exprs which is unique. set useful to false if this
+ * unique index is not useful for us.
+ */
+static List *
+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
+						   List *const_exprs,
+						   List *const_expr_opfamilies,
+						   Bitmapset *used_varattrs,
+						   bool *useful,
+						   bool *multi_nullvals)
+{
+	List	*exprs = NIL;
+	ListCell	*indexpr_item;
+	int	c = 0;
+
+	*useful = true;
+	*multi_nullvals = false;
+
+	indexpr_item = list_head(unique_index->indexprs);
+	for(c = 0; c < unique_index->nkeycolumns; c++)
+	{
+		int attr = unique_index->indexkeys[c];
+		Expr *expr;
+		bool	matched_const = false;
+		ListCell	*lc1, *lc2;
+
+		if(attr > 0)
+		{
+			expr = list_nth_node(TargetEntry, unique_index->indextlist, c)->expr;
+		}
+		else if (attr == 0)
+		{
+			/* Expression index */
+			expr = lfirst(indexpr_item);
+			indexpr_item = lnext(unique_index->indexprs, indexpr_item);
+		}
+		else /* attr < 0 */
+		{
+			/* Index on system column is not supported */
+			Assert(false);
+		}
+
+		/*
+		 * Check index_col = Const case with regarding to opfamily checking
+		 * If we can remove the index_col from the final UniqueKey->exprs.
+		 */
+		forboth(lc1, const_exprs, lc2, const_expr_opfamilies)
+		{
+			if (list_member_oid((List *)lfirst(lc2), unique_index->opfamily[c])
+				&& match_index_to_operand((Node *) lfirst(lc1), c, unique_index))
+			{
+				matched_const = true;
+				break;
+			}
+		}
+
+		if (matched_const)
+			continue;
+
+		/* Check if the indexed expr is used in rel */
+		if (attr > 0)
+		{
+			/*
+			 * Normal Indexed column, if the col is not used, then the index is useless
+			 * for uniquekey.
+			 */
+			attr -= FirstLowInvalidHeapAttributeNumber;
+
+			if (!bms_is_member(attr, used_varattrs))
+			{
+				*useful = false;
+				break;
+			}
+		}
+		else if (!list_member(unique_index->rel->reltarget->exprs, expr))
+		{
+			/* Expression index but the expression is not used in rel */
+			*useful = false;
+			break;
+		}
+
+		/* check not null property. */
+		if (attr == 0)
+		{
+			/* We never know if a expression yields null or not */
+			*multi_nullvals = true;
+		}
+		else if (!bms_is_member(attr, unique_index->rel->notnullattrs)
+				 && !bms_is_member(0 - FirstLowInvalidHeapAttributeNumber,
+								   unique_index->rel->notnullattrs))
+		{
+			*multi_nullvals = true;
+		}
+
+		exprs = lappend(exprs, expr);
+	}
+	return exprs;
+}
+
+
+/*
+ * add_uniquekey_for_onerow
+ * If we are sure that the relation only returns one row, then all the columns
+ * are unique. However we don't need to create UniqueKey for every column, we
+ * just set exprs = NIL and overwrites all the other UniqueKey on this RelOptInfo
+ * since this one has strongest semantics.
+ */
+void
+add_uniquekey_for_onerow(RelOptInfo *rel)
+{
+	/*
+	 * We overwrite the previous UniqueKey on purpose since this one has the
+	 * strongest semantic.
+	 */
+	rel->uniquekeys = list_make1(makeUniqueKey(NIL, false));
+}
+
+
+/*
+ * initililze_uniquecontext_for_joinrel
+ * Return a List of UniqueKeyContext for an inputrel
+ */
+static List *
+initililze_uniquecontext_for_joinrel(RelOptInfo *inputrel)
+{
+	List	*res = NIL;
+	ListCell *lc;
+	foreach(lc,  inputrel->uniquekeys)
+	{
+		UniqueKeyContext context;
+		context = palloc(sizeof(struct UniqueKeyContextData));
+		context->uniquekey = lfirst_node(UniqueKey, lc);
+		context->added_to_joinrel = false;
+		context->useful = true;
+		res = lappend(res, context);
+	}
+	return res;
+}
+
+
+/*
+ * get_exprs_from_uniquekey
+ *	Unify the way of get List of exprs from a one-row UniqueKey or
+ * normal UniqueKey. for the onerow case, every expr in rel1 is a valid
+ * UniqueKey. Return a List of exprs.
+ *
+ * rel1: The relation which you want to get the exprs.
+ * ukey: The UniqueKey you want to get the exprs.
+ */
+static List *
+get_exprs_from_uniquekey(RelOptInfo *joinrel, RelOptInfo *rel1, UniqueKey *ukey)
+{
+	ListCell *lc;
+	bool onerow = rel1 != NULL && relation_is_onerow(rel1);
+
+	List	*res = NIL;
+	Assert(onerow || ukey);
+	if (onerow)
+	{
+		/* Only cares about the exprs still exist in joinrel */
+		foreach(lc, joinrel->reltarget->exprs)
+		{
+			Bitmapset *relids = pull_varnos(lfirst(lc));
+			if (bms_is_subset(relids, rel1->relids))
+			{
+				res = lappend(res, list_make1(lfirst(lc)));
+			}
+		}
+	}
+	else
+	{
+		res = list_make1(ukey->exprs);
+	}
+	return res;
+}
+
+/*
+ * Partitioned table Unique Keys.
+ * The partition table unique key is maintained as:
+ * 1. The index must be unique as usual.
+ * 2. The index must contains partition key.
+ * 3. The index must exist on all the child rel. see simple_indexinfo_equal for
+ *    how we compare it.
+ */
+
+/*
+ * index_constains_partkey
+ * return true if the index contains the partiton key.
+ */
+static bool
+index_constains_partkey(RelOptInfo *partrel,  IndexOptInfo *ind)
+{
+	ListCell	*lc;
+	int	i;
+	Assert(IS_PARTITIONED_REL(partrel));
+	Assert(partrel->part_scheme->partnatts > 0);
+
+	for(i = 0; i < partrel->part_scheme->partnatts; i++)
+	{
+		Node *part_expr = linitial(partrel->partexprs[i]);
+		bool found_in_index = false;
+		foreach(lc, ind->indextlist)
+		{
+			Expr *index_expr = lfirst_node(TargetEntry, lc)->expr;
+			if (equal(index_expr, part_expr))
+			{
+				found_in_index = true;
+				break;
+			}
+		}
+		if (!found_in_index)
+			return false;
+	}
+	return true;
+}
+
+/*
+ * simple_indexinfo_equal
+ *
+ * Used to check if the 2 index is same as each other. The index here
+ * is COPIED from childrel and did some tiny changes(see
+ * simple_copy_indexinfo_to_parent)
+ */
+static bool
+simple_indexinfo_equal(IndexOptInfo *ind1, IndexOptInfo *ind2)
+{
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(get_tlist_exprs(ind1->indextlist, true),
+			  get_tlist_exprs(ind2->indextlist, true));
+}
+
+
+/*
+ * The below macros are used for simple_copy_indexinfo_to_parent which is so
+ * customized that I don't want to put it to copyfuncs.c. So copy it here.
+ */
+#define COPY_POINTER_FIELD(fldname, sz) \
+	do { \
+		Size	_size = (sz); \
+		newnode->fldname = palloc(_size); \
+		memcpy(newnode->fldname, from->fldname, _size); \
+	} while (0)
+
+#define COPY_NODE_FIELD(fldname) \
+	(newnode->fldname = copyObjectImpl(from->fldname))
+
+#define COPY_SCALAR_FIELD(fldname) \
+	(newnode->fldname = from->fldname)
+
+
+/*
+ * simple_copy_indexinfo_to_parent (from partition)
+ * Copy the IndexInfo from child relation to parent relation with some modification,
+ * which is used to test:
+ * 1. If the same index exists in all the childrels.
+ * 2. If the parentrel->reltarget/basicrestrict info matches this index.
+ */
+static IndexOptInfo *
+simple_copy_indexinfo_to_parent(PlannerInfo *root,
+								RelOptInfo *parentrel,
+								IndexOptInfo *from)
+{
+	IndexOptInfo *newnode = makeNode(IndexOptInfo);
+	AppendRelInfo *appinfo = find_appinfo_by_child(root, from->rel->relid);
+	ListCell	*lc;
+	int	idx = 0;
+
+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);
+
+	/* Convert index exprs on child expr to expr on parent */
+	foreach(lc, newnode->indextlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		/* Index on expression is ignored */
+		Assert(IsA(tle->expr, Var));
+		tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr);
+		newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno;
+		idx++;
+	}
+	newnode->rel = parentrel;
+	return newnode;
+}
+
+/*
+ * adjust_partition_unique_indexlist
+ *
+ * global_unique_indexes: At the beginning, it contains the copy & modified
+ * unique index from the first partition. And then check if each index in it still
+ * exists in the following partitions. If no, remove it. at last, it has an
+ * index list which exists in all the partitions.
+ */
+static void
+adjust_partition_unique_indexlist(PlannerInfo *root,
+								  RelOptInfo *parentrel,
+								  RelOptInfo *childrel,
+								  List **global_unique_indexes)
+{
+	ListCell	*lc, *lc2;
+	foreach(lc, *global_unique_indexes)
+	{
+		IndexOptInfo	*g_ind = lfirst_node(IndexOptInfo, lc);
+		bool found_in_child = false;
+
+		foreach(lc2, childrel->indexlist)
+		{
+			IndexOptInfo   *p_ind = lfirst_node(IndexOptInfo, lc2);
+			IndexOptInfo   *p_ind_copy;
+			if (!p_ind->unique || !p_ind->immediate ||
+				(p_ind->indpred != NIL && !p_ind->predOK))
+				continue;
+			p_ind_copy = simple_copy_indexinfo_to_parent(root, parentrel, p_ind);
+			if (simple_indexinfo_equal(p_ind_copy, g_ind))
+			{
+				found_in_child = true;
+				break;
+			}
+		}
+		if (!found_in_child)
+			/* The index doesn't exist in childrel, remove it from global_unique_indexes */
+			*global_unique_indexes = foreach_delete_current(*global_unique_indexes, lc);
+	}
+}
+
+/* Helper function for groupres/distinctrel */
+static void
+add_uniquekey_from_sortgroups(PlannerInfo *root, RelOptInfo *rel, List *sortgroups)
+{
+	Query *parse = root->parse;
+	List	*exprs;
+
+	/*
+	 * XXX: If there are some vars which is not in current levelsup, the semantic is
+	 * imprecise, should we avoid it or not? levelsup = 1 is just a demo, maybe we need to
+	 * check every level other than 0, if so, looks we have to write another
+	 * pull_var_walker.
+	 */
+	List	*upper_vars = pull_vars_of_level((Node*)sortgroups, 1);
+
+	if (upper_vars != NIL)
+		return;
+
+	exprs = get_sortgrouplist_exprs(sortgroups, parse->targetList);
+	rel->uniquekeys = lappend(rel->uniquekeys,
+							  makeUniqueKey(exprs,
+											false /* sortgroupclause can't be multi_nullvals */));
+}
+
+
+/*
+ * add_combined_uniquekey
+ * The combination of both UniqueKeys is a valid UniqueKey for joinrel no matter
+ * the jointype.
+ */
+bool
+add_combined_uniquekey(RelOptInfo *joinrel,
+					   RelOptInfo *outer_rel,
+					   RelOptInfo *inner_rel,
+					   UniqueKey *outer_ukey,
+					   UniqueKey *inner_ukey,
+					   JoinType jointype)
+{
+
+	ListCell	*lc1, *lc2;
+
+	/* Either side has multi_nullvals or we have outer join,
+	 * the combined UniqueKey has multi_nullvals */
+	bool multi_nullvals = outer_ukey->multi_nullvals ||
+		inner_ukey->multi_nullvals || IS_OUTER_JOIN(jointype);
+
+	/* The only case we can get onerow joinrel after join */
+	if  (relation_is_onerow(outer_rel)
+		 && relation_is_onerow(inner_rel)
+		 && jointype == JOIN_INNER)
+	{
+		add_uniquekey_for_onerow(joinrel);
+		return true;
+	}
+
+	foreach(lc1, get_exprs_from_uniquekey(joinrel, outer_rel, outer_ukey))
+	{
+		foreach(lc2, get_exprs_from_uniquekey(joinrel, inner_rel, inner_ukey))
+		{
+			List *exprs = list_concat_copy(lfirst_node(List, lc1), lfirst_node(List, lc2));
+			joinrel->uniquekeys = lappend(joinrel->uniquekeys,
+										  makeUniqueKey(exprs,
+														multi_nullvals));
+		}
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 139c5e3dc2..6dc65faa11 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2389,6 +2389,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		add_path(final_rel, path);
 	}
 
+	simple_copy_uniquekeys(current_rel, final_rel);
+
 	/*
 	 * Generate partial paths for final_rel, too, if outer query levels might
 	 * be able to make use of them.
@@ -3899,6 +3901,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	set_cheapest(grouped_rel);
+
+	populate_grouprel_uniquekeys(root, grouped_rel, input_rel);
 	return grouped_rel;
 }
 
@@ -4616,7 +4620,7 @@ create_window_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
-
+	simple_copy_uniquekeys(input_rel, window_rel);
 	return window_rel;
 }
 
@@ -4906,7 +4910,7 @@ create_distinct_paths(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
-
+	populate_distinctrel_uniquekeys(root, input_rel, distinct_rel);
 	return distinct_rel;
 }
 
@@ -5167,6 +5171,8 @@ create_ordered_paths(PlannerInfo *root,
 	 */
 	Assert(ordered_rel->pathlist != NIL);
 
+	simple_copy_uniquekeys(input_rel, ordered_rel);
+
 	return ordered_rel;
 }
 
@@ -6044,6 +6050,9 @@ adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 	if (list_length(targets) == 1)
 		return;
 
+	/* UniqueKey is not valid after handling the SRF. */
+	rel->uniquekeys = NIL;
+
 	/*
 	 * Stack SRF-evaluation nodes atop each path for the rel.
 	 *
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 745f443e5c..ce290cb97b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -689,6 +689,8 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 	/* Undo effects of possibly forcing tuple_fraction to 0 */
 	root->tuple_fraction = save_fraction;
 
+	/* Add the UniqueKeys */
+	populate_unionrel_uniquekeys(root, result_rel);
 	return result_rel;
 }
 
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index d722063cf3..44c37ecffc 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -746,3 +746,47 @@ find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
 	}
 	return appinfos;
 }
+
+/*
+ * find_appinfo_by_child
+ *
+ */
+AppendRelInfo *
+find_appinfo_by_child(PlannerInfo *root, Index child_index)
+{
+	ListCell	*lc;
+	foreach(lc, root->append_rel_list)
+	{
+		AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
+		if (appinfo->child_relid == child_index)
+			return appinfo;
+	}
+	elog(ERROR, "parent relation cant be found");
+	return NULL;
+}
+
+/*
+ * find_parent_var
+ *
+ */
+Var *
+find_parent_var(AppendRelInfo *appinfo, Var *child_var)
+{
+	ListCell	*lc;
+	Var	*res = NULL;
+	Index attno = 1;
+	foreach(lc, appinfo->translated_vars)
+	{
+		Node *child_node = lfirst(lc);
+		if (equal(child_node, child_var))
+		{
+			res = copyObject(child_var);
+			res->varattno = attno;
+			res->varno = appinfo->parent_relid;
+		}
+		attno++;
+	}
+	if (res == NULL)
+		elog(ERROR, "parent var can't be found.");
+	return res;
+}
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 3132fd35a5..d66b40ec50 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -736,6 +736,7 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 		{
 			Node	   *onecq = (Node *) lfirst(lc2);
 			bool		pseudoconstant;
+			RestrictInfo	*child_rinfo;
 
 			/* check for pseudoconstant (no Vars or volatile functions) */
 			pseudoconstant =
@@ -747,13 +748,14 @@ apply_child_basequals(PlannerInfo *root, RelOptInfo *parentrel,
 				root->hasPseudoConstantQuals = true;
 			}
 			/* reconstitute RestrictInfo with appropriate properties */
-			childquals = lappend(childquals,
-								 make_restrictinfo((Expr *) onecq,
-												   rinfo->is_pushed_down,
-												   rinfo->outerjoin_delayed,
-												   pseudoconstant,
-												   rinfo->security_level,
-												   NULL, NULL, NULL));
+			child_rinfo =  make_restrictinfo((Expr *) onecq,
+											 rinfo->is_pushed_down,
+											 rinfo->outerjoin_delayed,
+											 pseudoconstant,
+											 rinfo->security_level,
+											 NULL, NULL, NULL);
+			child_rinfo->mergeopfamilies = rinfo->mergeopfamilies;
+			childquals = lappend(childquals, child_rinfo);
 			/* track minimum security level among child quals */
 			cq_min_security = Min(cq_min_security, rinfo->security_level);
 		}
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
index 31d9aedeeb..c83f17acb7 100644
--- a/src/include/nodes/makefuncs.h
+++ b/src/include/nodes/makefuncs.h
@@ -16,6 +16,7 @@
 
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 
 extern A_Expr *makeA_Expr(A_Expr_Kind kind, List *name,
@@ -105,4 +106,6 @@ extern GroupingSet *makeGroupingSet(GroupingSetKind kind, List *content, int loc
 
 extern VacuumRelation *makeVacuumRelation(RangeVar *relation, Oid oid, List *va_cols);
 
+extern UniqueKey* makeUniqueKey(List *exprs, bool multi_nullvals);
+
 #endif							/* MAKEFUNC_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 381d84b4e4..41110ed888 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -264,6 +264,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 9e3ebd488a..02e4458bef 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -730,6 +730,7 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;	/* cost of evaluating the above */
 	Index		baserestrict_min_security;	/* min security_level found in
 											 * baserestrictinfo */
+	List	   *uniquekeys;		/* List of UniqueKey */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
 	bool		has_eclass_joins;	/* T means joininfo is incomplete */
@@ -1047,6 +1048,28 @@ typedef struct PathKey
 } PathKey;
 
 
+/*
+ * UniqueKey
+ *
+ * Represents the unique properties held by a RelOptInfo.
+ *
+ * exprs is a list of exprs which is unique on current RelOptInfo. exprs = NIL
+ * is a special case of UniqueKey, which means there is only 1 row in that
+ * relation.
+ * multi_nullvals: true means multi null values may exist in these exprs, so the
+ * uniqueness is not guaranteed in this case. This field is necessary for
+ * remove_useless_join & reduce_unique_semijoins where we don't mind these
+ * duplicated NULL values. It is set to true for 2 cases. One is a unique key
+ * from a unique index but the related column is nullable. The other one is for
+ * outer join. see populate_joinrel_uniquekeys for detail.
+ */
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	bool		multi_nullvals;
+} UniqueKey;
+
 /*
  * PathTarget
  *
@@ -2473,7 +2496,7 @@ typedef enum
  *
  * flags indicating what kinds of grouping are possible.
  * partial_costs_set is true if the agg_partial_costs and agg_final_costs
- * 		have been initialized.
+ *		have been initialized.
  * agg_partial_costs gives partial aggregation costs.
  * agg_final_costs gives finalization costs.
  * target_parallel_safe is true if target is parallel safe.
@@ -2503,8 +2526,8 @@ typedef struct
  * limit_tuples is an estimated bound on the number of output tuples,
  *		or -1 if no LIMIT or couldn't estimate.
  * count_est and offset_est are the estimated values of the LIMIT and OFFSET
- * 		expressions computed by preprocess_limit() (see comments for
- * 		preprocess_limit() for more information).
+ *		expressions computed by preprocess_limit() (see comments for
+ *		preprocess_limit() for more information).
  */
 typedef struct
 {
diff --git a/src/include/nodes/pg_list.h b/src/include/nodes/pg_list.h
index 14ea2766ad..621f54a9f8 100644
--- a/src/include/nodes/pg_list.h
+++ b/src/include/nodes/pg_list.h
@@ -528,6 +528,8 @@ extern bool list_member_ptr(const List *list, const void *datum);
 extern bool list_member_int(const List *list, int datum);
 extern bool list_member_oid(const List *list, Oid datum);
 
+extern bool list_is_subset(const List *members, const List *target);
+
 extern List *list_delete(List *list, void *datum);
 extern List *list_delete_ptr(List *list, void *datum);
 extern List *list_delete_int(List *list, int datum);
diff --git a/src/include/optimizer/appendinfo.h b/src/include/optimizer/appendinfo.h
index d6a27a60dd..e87c92a054 100644
--- a/src/include/optimizer/appendinfo.h
+++ b/src/include/optimizer/appendinfo.h
@@ -32,4 +32,7 @@ extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
 extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
 											   Relids relids, int *nappinfos);
 
+extern AppendRelInfo *find_appinfo_by_child(PlannerInfo *root, Index child_index);
+extern Var *find_parent_var(AppendRelInfo *appinfo, Var *child_var);
+
 #endif							/* APPENDINFO_H */
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 3e4171056e..9445141263 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -23,6 +23,7 @@
 #define OPTIMIZER_H
 
 #include "nodes/parsenodes.h"
+#include "nodes/pathnodes.h"
 
 /*
  * We don't want to include nodes/pathnodes.h here, because non-planner
@@ -156,6 +157,7 @@ extern TargetEntry *get_sortgroupref_tle(Index sortref,
 										 List *targetList);
 extern TargetEntry *get_sortgroupclause_tle(SortGroupClause *sgClause,
 											List *targetList);
+extern Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 extern Node *get_sortgroupclause_expr(SortGroupClause *sgClause,
 									  List *targetList);
 extern List *get_sortgrouplist_exprs(List *sgClauses,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 10b6e81079..9217a8d6c6 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -240,5 +240,48 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   int strategy, bool nulls_first);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
+extern List *select_mergejoin_clauses(PlannerInfo *root,
+									  RelOptInfo *joinrel,
+									  RelOptInfo *outerrel,
+									  RelOptInfo *innerrel,
+									  List *restrictlist,
+									  JoinType jointype,
+									  bool *mergejoin_allowed);
+
+/*
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ */
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *baserel,
+										List* unique_index_list);
+extern void populate_partitionedrel_uniquekeys(PlannerInfo *root,
+												RelOptInfo *rel,
+												List *childrels);
+extern void populate_distinctrel_uniquekeys(PlannerInfo *root,
+											RelOptInfo *inputrel,
+											RelOptInfo *distinctrel);
+extern void populate_grouprel_uniquekeys(PlannerInfo *root,
+										 RelOptInfo *grouprel,
+										 RelOptInfo *inputrel);
+extern void populate_unionrel_uniquekeys(PlannerInfo *root,
+										  RelOptInfo *unionrel);
+extern void simple_copy_uniquekeys(RelOptInfo *oldrel,
+								   RelOptInfo *newrel);
+extern void convert_subquery_uniquekeys(PlannerInfo *root,
+										RelOptInfo *currel,
+										RelOptInfo *sub_final_rel);
+extern void populate_joinrel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *joinrel,
+										RelOptInfo *rel1,
+										RelOptInfo *rel2,
+										List *restrictlist,
+										JoinType jointype);
+
+extern bool relation_has_uniquekeys_for(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *exprs,
+										bool allow_multinulls);
+extern bool relation_is_onerow(RelOptInfo *rel);
 
 #endif							/* PATHS_H */
-- 
2.21.0

v11-0005-Treat-the-input-as-sorted-if-the-group-by-clause.patchapplication/octet-stream; name=v11-0005-Treat-the-input-as-sorted-if-the-group-by-clause.patchDownload

From b2852ab23e264b9088bd9e6e9d72df9d35144980 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 16:32:28 +0800
Subject: [PATCH v11 5/6] Treat the input as sorted if the group by clause is
 unique already.

In the previous commit, if no aggregation call, the group node will
be removed totally, so this patch is just for the cases where aggregation
call exists
---
 src/backend/commands/explain.c           |   4 +
 src/backend/executor/nodeAgg.c           |   6 ++
 src/backend/optimizer/plan/createplan.c  |   3 +-
 src/backend/optimizer/plan/planner.c     | 127 +++++++++++++++--------
 src/include/nodes/nodes.h                |   3 +-
 src/include/nodes/pathnodes.h            |   1 +
 src/include/nodes/plannodes.h            |   1 +
 src/test/regress/expected/aggregates.out |  42 ++++++++
 src/test/regress/sql/aggregates.sql      |  17 +++
 9 files changed, 159 insertions(+), 45 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c98c9b5547..393c766fdc 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1918,6 +1918,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_hashagg_info((AggState *) planstate, es);
+			if (es->format != EXPLAIN_FORMAT_TEXT ||
+				(es->verbose && ((Agg *) plan)->input_unique))
+				ExplainPropertyBool("Input Unique",
+									((Agg *) plan)->input_unique, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 9776263ae7..14a4627b8b 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2170,6 +2170,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_UNIQUE:
+				/* AGG_UNIQUE is translated to AGG_SORTED, Handle it here
+				 * to make compiler quiet.
+				 */
+				Assert(false);
+				break;
 		}
 
 		if (!TupIsNull(result))
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 99278eed93..c7481436e4 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -6355,7 +6355,7 @@ make_agg(List *tlist, List *qual,
 	/* Reduce to long, but 'ware overflow! */
 	numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 
-	node->aggstrategy = aggstrategy;
+	node->aggstrategy = aggstrategy == AGG_UNIQUE ? AGG_SORTED : aggstrategy;
 	node->aggsplit = aggsplit;
 	node->numCols = numGroupCols;
 	node->grpColIdx = grpColIdx;
@@ -6366,6 +6366,7 @@ make_agg(List *tlist, List *qual,
 	node->aggParams = NULL;		/* SS_finalize_plan() will fill this */
 	node->groupingSets = groupingSets;
 	node->chain = chain;
+	node->input_unique = aggstrategy == AGG_UNIQUE;
 
 	plan->qual = qual;
 	plan->targetlist = tlist;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f5eb72b0ae..f07e8336d2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3852,51 +3852,61 @@ create_grouping_paths(PlannerInfo *root,
 		int			flags = 0;
 		GroupPathExtraData extra;
 
-		/*
-		 * Determine whether it's possible to perform sort-based
-		 * implementations of grouping.  (Note that if groupClause is empty,
-		 * grouping_is_sortable() is trivially true, and all the
-		 * pathkeys_contained_in() tests will succeed too, so that we'll
-		 * consider every surviving input path.)
-		 *
-		 * If we have grouping sets, we might be able to sort some but not all
-		 * of them; in this case, we need can_sort to be true as long as we
-		 * must consider any sorted-input plan.
-		 */
-		if ((gd && gd->rollups != NIL)
-			|| grouping_is_sortable(parse->groupClause))
-			flags |= GROUPING_CAN_USE_SORT;
+		if (group_unique_input)
+		{
+			/* In this case we don't need to set other flags */
+			Assert(parse->groupClause != NIL);
+			Assert(gd == NULL);
+			flags |= GROUPING_INPUT_UNIQUE;
+		}
+		else
+		{
+			/*
+			 * Determine whether it's possible to perform sort-based
+			 * implementations of grouping.  (Note that if groupClause is empty,
+			 * grouping_is_sortable() is trivially true, and all the
+			 * pathkeys_contained_in() tests will succeed too, so that we'll
+			 * consider every surviving input path.)
+			 *
+			 * If we have grouping sets, we might be able to sort some but not all
+			 * of them; in this case, we need can_sort to be true as long as we
+			 * must consider any sorted-input plan.
+			 */
+			if ((gd && gd->rollups != NIL)
+				|| grouping_is_sortable(parse->groupClause))
+				flags |= GROUPING_CAN_USE_SORT;
 
-		/*
-		 * Determine whether we should consider hash-based implementations of
-		 * grouping.
-		 *
-		 * Hashed aggregation only applies if we're grouping. If we have
-		 * grouping sets, some groups might be hashable but others not; in
-		 * this case we set can_hash true as long as there is nothing globally
-		 * preventing us from hashing (and we should therefore consider plans
-		 * with hashes).
-		 *
-		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
-		 * BY aggregates.  (Doing so would imply storing *all* the input
-		 * values in the hash table, and/or running many sorts in parallel,
-		 * either of which seems like a certain loser.)  We similarly don't
-		 * support ordered-set aggregates in hashed aggregation, but that case
-		 * is also included in the numOrderedAggs count.
-		 *
-		 * Note: grouping_is_hashable() is much more expensive to check than
-		 * the other gating conditions, so we want to do it last.
-		 */
-		if ((parse->groupClause != NIL &&
-			 agg_costs->numOrderedAggs == 0 &&
-			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
-			flags |= GROUPING_CAN_USE_HASH;
+			/*
+			 * Determine whether we should consider hash-based implementations of
+			 * grouping.
+			 *
+			 * Hashed aggregation only applies if we're grouping. If we have
+			 * grouping sets, some groups might be hashable but others not; in
+			 * this case we set can_hash true as long as there is nothing globally
+			 * preventing us from hashing (and we should therefore consider plans
+			 * with hashes).
+			 *
+			 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+			 * BY aggregates.  (Doing so would imply storing *all* the input
+			 * values in the hash table, and/or running many sorts in parallel,
+			 * either of which seems like a certain loser.)  We similarly don't
+			 * support ordered-set aggregates in hashed aggregation, but that case
+			 * is also included in the numOrderedAggs count.
+			 *
+			 * Note: grouping_is_hashable() is much more expensive to check than
+			 * the other gating conditions, so we want to do it last.
+			 */
+			if ((parse->groupClause != NIL &&
+				 agg_costs->numOrderedAggs == 0 &&
+				 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+				flags |= GROUPING_CAN_USE_HASH;
 
-		/*
-		 * Determine whether partial aggregation is possible.
-		 */
-		if (can_partial_agg(root, agg_costs))
-			flags |= GROUPING_CAN_PARTIAL_AGG;
+			/*
+			 * Determine whether partial aggregation is possible.
+			 */
+			if (can_partial_agg(root, agg_costs))
+				flags |= GROUPING_CAN_PARTIAL_AGG;
+		}
 
 		extra.flags = flags;
 		extra.target_parallel_safe = target_parallel_safe;
@@ -6517,9 +6527,40 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		group_input_unique = (extra->flags & GROUPING_INPUT_UNIQUE) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
+	if (group_input_unique)
+	{
+		Path *path = input_rel->cheapest_total_path;
+		add_path(grouped_rel, (Path *) create_agg_path(root,
+													   grouped_rel,
+													   path,
+													   grouped_rel->reltarget,
+													   AGG_UNIQUE,
+													   AGGSPLIT_SIMPLE,
+													   parse->groupClause,
+													   havingQual,
+													   agg_costs,
+													   dNumGroups));
+
+		if (path != input_rel->cheapest_startup_path)
+		{
+			path = input_rel->cheapest_startup_path;
+			add_path(grouped_rel, (Path *) create_agg_path(root,
+														   grouped_rel,
+														   path,
+														   grouped_rel->reltarget,
+														   AGG_UNIQUE,
+														   AGGSPLIT_SIMPLE,
+														   parse->groupClause,
+														   havingQual,
+														   agg_costs,
+														   dNumGroups));
+		}
+		return;
+	}
 	if (can_sort)
 	{
 		/*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 41110ed888..010266ed4b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -761,7 +761,8 @@ typedef enum AggStrategy
 	AGG_PLAIN,					/* simple agg across all input rows */
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
-	AGG_MIXED					/* grouped agg, hash and sort both used */
+	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_UNIQUE					/* grouped agg, the group clause is unique */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 02e4458bef..b4bb4e6267 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2472,6 +2472,7 @@ typedef struct JoinPathExtraData
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_INPUT_UNIQUE		0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 83e01074ed..be31020a40 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -828,6 +828,7 @@ typedef struct Agg
 	/* Note: planner provides numGroups & aggParams only in HASHED/MIXED case */
 	List	   *groupingSets;	/* grouping sets to use */
 	List	   *chain;			/* chained Agg/Sort nodes */
+	bool		input_unique;   /* The input is unique already */
 } Agg;
 
 /* ----------------
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f4c4a21bc8..94ee842602 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2753,6 +2753,46 @@ set work_mem to default;
 ----+----+----
 (0 rows)
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+              QUERY PLAN               
+---------------------------------------
+ GroupAggregate
+   Output: pk, sum(b)
+   Group Key: agg_unique_1.pk
+   Input Unique: true
+   ->  Seq Scan on public.agg_unique_1
+         Output: pk, b
+(6 rows)
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+                            QUERY PLAN                             
+-------------------------------------------------------------------
+ GroupAggregate
+   Output: agg_unique_2.unsortable_col, count(*)
+   Group Key: agg_unique_2.unsortable_col
+   Input Unique: true
+   ->  HashAggregate
+         Output: agg_unique_2.unsortable_col
+         Group Key: agg_unique_2.unsortable_col
+         ->  Seq Scan on public.agg_unique_2
+               Output: agg_unique_2.a, agg_unique_2.unsortable_col
+(9 rows)
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+ unsortable_col | count 
+----------------+-------
+              2 |     1
+              1 |     1
+(2 rows)
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -2761,3 +2801,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index f4d7f7057f..7602e1d1d2 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1216,6 +1216,21 @@ set work_mem to default;
   union all
 (select * from agg_group_4 except select * from agg_hash_4);
 
+create table agg_unique_1(pk int primary key,  b int);
+create table agg_unique_2(a int, unsortable_col xid);
+insert into agg_unique_2 values(1, '1'), (2, '2'), (2, '1');
+
+explain (costs off, verbose)  select pk, sum(b) from agg_unique_1
+group by pk;
+
+explain (costs off, verbose) select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
+select unsortable_col, count(*)
+from (select distinct unsortable_col from agg_unique_2) t
+group by unsortable_col;
+
 drop table agg_group_1;
 drop table agg_group_2;
 drop table agg_group_3;
@@ -1224,3 +1239,5 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_unique_1;
+drop table agg_unique_2;
-- 
2.21.0

v11-0003-Refactoring-existing-uniqueness-related-code-to-.patchapplication/octet-stream; name=v11-0003-Refactoring-existing-uniqueness-related-code-to-.patchDownload

From ff772be3a7016e6918277958b2612a38d8d4a05e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 6 May 2020 15:44:15 +0800
Subject: [PATCH v11 3/6] Refactoring existing uniqueness related code to use
 UniqueKey

The call of remove_useless_joins and reduce_unique_semijoins are
postponed due to this as well
---
 .../postgres_fdw/expected/postgres_fdw.out    |  32 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   1 +
 src/backend/optimizer/path/allpaths.c         |  18 ++-
 src/backend/optimizer/plan/analyzejoins.c     | 137 ++++--------------
 src/backend/optimizer/plan/planmain.c         |  13 --
 src/test/regress/expected/join.out            |  59 ++++----
 src/test/regress/sql/join.sql                 |  16 +-
 7 files changed, 97 insertions(+), 179 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 84bc0ee381..7b55b8c726 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1378,6 +1378,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1
 (8 rows)
 
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
                                                                                                                                                                                              QUERY PLAN                                                                                                                                                                                             
@@ -1386,6 +1387,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
    Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
    ->  Nested Loop
          Output: "T 3".c1, ft4.c1, ft5.c1, "T 3".ctid, ft4.*, ft5.*
+         Inner Unique: true
          ->  Foreign Scan
                Output: ft4.c1, ft4.*, ft5.c1, ft5.*
                Relations: (public.ft4) FULL JOIN (public.ft5)
@@ -1410,7 +1412,7 @@ SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNE
                ->  Seq Scan on "S 1"."T 3"
                      Output: "T 3".c1, "T 3".ctid
                      Filter: ("T 3".c1 = 50)
-(28 rows)
+(29 rows)
 
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
  c1 | a  | b  
@@ -2902,22 +2904,20 @@ select sum(c1%3), sum(distinct c1%3 order by c1%3) filter (where c1%3 < 2), c2 f
 -- Outer query is aggregation query
 explain (verbose, costs off)
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
-                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
- Unique
+                                                       QUERY PLAN                                                       
+------------------------------------------------------------------------------------------------------------------------
+ Sort
    Output: ((SubPlan 1))
-   ->  Sort
-         Output: ((SubPlan 1))
-         Sort Key: ((SubPlan 1))
-         ->  Foreign Scan
-               Output: (SubPlan 1)
-               Relations: Aggregate on (public.ft2 t2)
-               Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
-               SubPlan 1
-                 ->  Foreign Scan on public.ft1 t1
-                       Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
-                       Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
-(13 rows)
+   Sort Key: ((SubPlan 1))
+   ->  Foreign Scan
+         Output: (SubPlan 1)
+         Relations: Aggregate on (public.ft2 t2)
+         Remote SQL: SELECT count(*) FILTER (WHERE ((c2 = 6) AND ("C 1" < 10))) FROM "S 1"."T 1" WHERE (((c2 % 6) = 0))
+         SubPlan 1
+           ->  Foreign Scan on public.ft1 t1
+                 Output: (count(*) FILTER (WHERE ((t2.c2 = 6) AND (t2.c1 < 10))))
+                 Remote SQL: SELECT NULL FROM "S 1"."T 1" WHERE (("C 1" = 6))
+(11 rows)
 
 select distinct (select count(*) filter (where t2.c2 = 6 and t2.c1 < 10) from ft1 t1 where t1.c1 = 6) from ft2 t2 where t2.c2 % 6 = 0 order by 1;
  count 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d452d06343..c16dbf05a8 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -453,6 +453,7 @@ EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t1 FULL JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (t1.c1 = ss.a) ORDER BY t1.c1, ss.a, ss.b;
 -- d. test deparsing rowmarked relations as subqueries
+-- YYY:  The inner table in the plan is "T 3", which has a primary key on c1, that's why we have the new added "Inner Unique: true".
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
 SELECT t1.c1, ss.a, ss.b FROM (SELECT c1 FROM "S 1"."T 3" WHERE c1 = 50) t1 INNER JOIN (SELECT t2.c1, t3.c1 FROM (SELECT c1 FROM ft4 WHERE c1 between 50 and 60) t2 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 between 50 and 60) t3 ON (t2.c1 = t3.c1) WHERE t2.c1 IS NULL OR t2.c1 IS NOT NULL) ss(a, b) ON (TRUE) ORDER BY t1.c1, ss.a, ss.b FOR UPDATE OF t1;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 66d246fa1a..69fb511e0d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -39,6 +39,7 @@
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
@@ -222,13 +223,24 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	set_base_rel_pathlists(root);
 
 	/*
-	 * Generate access paths for the entire join tree.
+	 * Remove any useless outer joins.  Ideally this would be done during
+	 * jointree preprocessing, but the necessary information isn't available
+	 * until we've built baserel data structures, classified qual clauses
+	 * and uniquekeys
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	joinlist = remove_useless_joins(root, joinlist);
+
+	/*
+	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
+	 * Likewise, this can't be done until now for lack of needed info.
+	 */
+	reduce_unique_semijoins(root);
 
 	/*
-	 * The result should join all and only the query's base rels.
+	 * Generate access paths for the entire join tree.
 	 */
+	rel = make_rel_from_joinlist(root, joinlist);
+
 	Assert(bms_equal(rel->relids, root->all_baserels));
 
 	return rel;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index d0ff660284..d80bff65d2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -439,6 +439,8 @@ remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
 	 * There may be references to the rel in root->fkey_list, but if so,
 	 * match_foreign_keys_to_quals() will get rid of them.
 	 */
+
+	root->all_baserels = bms_del_member(root->all_baserels, relid);
 }
 
 /*
@@ -584,39 +586,7 @@ reduce_unique_semijoins(PlannerInfo *root)
 static bool
 rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 {
-	/* We only know about baserels ... */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * For a plain relation, we only know how to prove uniqueness by
-		 * reference to unique indexes.  Make sure there's at least one
-		 * suitable unique index.  It must be immediately enforced, and if
-		 * it's a partial index, it must match the query.  (Keep these
-		 * conditions in sync with relation_has_unique_index_for!)
-		 */
-		ListCell   *lc;
-
-		foreach(lc, rel->indexlist)
-		{
-			IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
-
-			if (ind->unique && ind->immediate &&
-				(ind->indpred == NIL || ind->predOK))
-				return true;
-		}
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Query	   *subquery = root->simple_rte_array[rel->relid]->subquery;
-
-		/* Check if the subquery has any qualities that support distinctness */
-		if (query_supports_distinctness(subquery))
-			return true;
-	}
-	/* We have no proof rules for any other rtekinds. */
-	return false;
+	return rel->uniquekeys != NIL;
 }
 
 /*
@@ -640,83 +610,33 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
 static bool
 rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List *clause_list)
 {
-	/*
-	 * We could skip a couple of tests here if we assume all callers checked
-	 * rel_supports_distinctness first, but it doesn't seem worth taking any
-	 * risk for.
-	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
-		return false;
-	if (rel->rtekind == RTE_RELATION)
-	{
-		/*
-		 * Examine the indexes to see if we have a matching unique index.
-		 * relation_has_unique_index_for automatically adds any usable
-		 * restriction clauses for the rel, so we needn't do that here.
-		 */
-		if (relation_has_unique_index_for(root, rel, clause_list, NIL, NIL))
-			return true;
-	}
-	else if (rel->rtekind == RTE_SUBQUERY)
-	{
-		Index		relid = rel->relid;
-		Query	   *subquery = root->simple_rte_array[relid]->subquery;
-		List	   *colnos = NIL;
-		List	   *opids = NIL;
-		ListCell   *l;
 
-		/*
-		 * Build the argument lists for query_is_distinct_for: a list of
-		 * output column numbers that the query needs to be distinct over, and
-		 * a list of equality operators that the output columns need to be
-		 * distinct according to.
-		 *
-		 * (XXX we are not considering restriction clauses attached to the
-		 * subquery; is that worth doing?)
-		 */
-		foreach(l, clause_list)
+	ListCell	*lc1, *lc2,  *lc3;
+	foreach(lc1,  rel->uniquekeys)
+	{
+		UniqueKey *uqk = lfirst_node(UniqueKey, lc1);
+		bool all_uqk_exprs_found = true;
+		foreach(lc2, uqk->exprs)
 		{
-			RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
-			Oid			op;
-			Var		   *var;
-
-			/*
-			 * Get the equality operator we need uniqueness according to.
-			 * (This might be a cross-type operator and thus not exactly the
-			 * same operator the subquery would consider; that's all right
-			 * since query_is_distinct_for can resolve such cases.)  The
-			 * caller's mergejoinability test should have selected only
-			 * OpExprs.
-			 */
-			op = castNode(OpExpr, rinfo->clause)->opno;
-
-			/* caller identified the inner side for us */
-			if (rinfo->outer_is_left)
-				var = (Var *) get_rightop(rinfo->clause);
-			else
-				var = (Var *) get_leftop(rinfo->clause);
-
-			/*
-			 * We may ignore any RelabelType node above the operand.  (There
-			 * won't be more than one, since eval_const_expressions() has been
-			 * applied already.)
-			 */
-			if (var && IsA(var, RelabelType))
-				var = (Var *) ((RelabelType *) var)->arg;
-
-			/*
-			 * If inner side isn't a Var referencing a subquery output column,
-			 * this clause doesn't help us.
-			 */
-			if (!var || !IsA(var, Var) ||
-				var->varno != relid || var->varlevelsup != 0)
-				continue;
-
-			colnos = lappend_int(colnos, var->varattno);
-			opids = lappend_oid(opids, op);
+			Node *uq_expr = lfirst(lc2);
+			bool find_uq_exprs_in_clause_list = false;
+			foreach(lc3, clause_list)
+			{
+				RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc3);
+				Node *clause_expr = rinfo->outer_is_left ? get_rightop(rinfo->clause): get_leftop(rinfo->clause) ;
+				if (equal(uq_expr, clause_expr))
+				{
+					find_uq_exprs_in_clause_list = true;
+					break;
+				}
+			}
+			if (!find_uq_exprs_in_clause_list)
+			{
+				all_uqk_exprs_found = false;
+				break;
+			}
 		}
-
-		if (query_is_distinct_for(subquery, colnos, opids))
+		if (all_uqk_exprs_found)
 			return true;
 	}
 	return false;
@@ -972,6 +892,9 @@ innerrel_is_unique(PlannerInfo *root,
 	MemoryContext old_context;
 	ListCell   *lc;
 
+	if (relation_is_onerow(innerrel))
+		return true;
+
 	/* Certainly can't prove uniqueness when there are no joinclauses */
 	if (restrictlist == NIL)
 		return false;
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 62dfc6d44a..6ad73cb57b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -213,19 +213,6 @@ query_planner(PlannerInfo *root,
 	 */
 	fix_placeholder_input_needed_levels(root);
 
-	/*
-	 * Remove any useless outer joins.  Ideally this would be done during
-	 * jointree preprocessing, but the necessary information isn't available
-	 * until we've built baserel data structures and classified qual clauses.
-	 */
-	joinlist = remove_useless_joins(root, joinlist);
-
-	/*
-	 * Also, reduce any semijoins with unique inner rels to plain inner joins.
-	 * Likewise, this can't be done until now for lack of needed info.
-	 */
-	reduce_unique_semijoins(root);
-
 	/*
 	 * Now distribute "placeholders" to base rels as needed.  This has to be
 	 * done after join removal because removal could change whether a
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index a46b1573bd..8378936eda 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4349,11 +4349,11 @@ reset enable_nestloop;
 --
 begin;
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 -- all three cases should be optimizable into a simple seqscan
@@ -4411,40 +4411,37 @@ select d.* from d left join (select distinct * from b) s
 (1 row)
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Group
-         Group Key: b.id
-         ->  Index Scan using b_pkey on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.d, b.c_id
+                     ->  Seq Scan on b
 (8 rows)
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
-   ->  Sort
-         Sort Key: d.a
-         ->  Seq Scan on d
-(9 rows)
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
+                 QUERY PLAN                 
+--------------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = s.d)
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Subquery Scan on s
+               ->  HashAggregate
+                     Group Key: b.c_id, b.d
+                     ->  Seq Scan on b
+(8 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 1403e0ffe7..3312542411 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1479,11 +1479,11 @@ reset enable_nestloop;
 begin;
 
 CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
-CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
+CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int, d int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
 INSERT INTO a VALUES (0, 0), (1, NULL);
-INSERT INTO b VALUES (0, 0), (1, NULL);
+INSERT INTO b VALUES (0, 0, 1), (1, NULL, 1);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
 
@@ -1512,17 +1512,15 @@ select d.* from d left join (select distinct * from b) s
   on d.a = s.id and d.b = s.c_id;
 
 -- join removal is not possible when the GROUP BY contains a column that is
--- not in the join condition.  (Note: as of 9.6, we notice that b.id is a
--- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
--- but this happens too late for join removal in the outer plan level.)
+-- not in the join condition.
 explain (costs off)
-select d.* from d left join (select * from b group by b.id, b.c_id) s
-  on d.a = s.id;
+select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
+  on d.a = s.d;
 
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
-select d.* from d left join (select distinct * from b) s
-  on d.a = s.id;
+select d.* from d left join (select distinct c_id, d from b) s
+  on d.a = s.d;
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
-- 
2.21.0

v11-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchapplication/octet-stream; name=v11-0001-Introduce-RelOptInfo-notnullattrs-attribute.patchDownload

From 15caec04b45ce57f76423cba0dee4e461a55f257 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sun, 3 May 2020 22:37:46 +0800
Subject: [PATCH v11 1/6] Introduce RelOptInfo->notnullattrs attribute

The notnullattrs is calculated from catalog and run-time query. That
infomation is translated to child relation as well for partitioned
table.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b399592ff8..754f6d64f6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1009,6 +1009,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1065,6 +1066,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e978b491f6..95b1b14cd3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -830,6 +830,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									false, below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f9d0d67aa7..bdd4876a84 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -463,6 +464,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 485d1b06c9..9e3ebd488a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -709,6 +709,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v11-0004-Remove-distinct-node-AggNode-if-the-input-is-uni.patchapplication/octet-stream; name=v11-0004-Remove-distinct-node-AggNode-if-the-input-is-uni.patchDownload

From 19f07073edab4aa9245b482126743c8d41e2311e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 11 May 2020 15:54:49 +0800
Subject: [PATCH v11 4/6] Remove distinct node & AggNode if the input is unique
 already.

---
 src/backend/optimizer/plan/planner.c          |  30 +-
 src/test/regress/expected/aggregates.out      |  73 ++--
 src/test/regress/expected/select_distinct.out | 388 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql           |   6 +-
 src/test/regress/sql/select_distinct.sql      | 146 +++++++
 5 files changed, 593 insertions(+), 50 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6dc65faa11..f5eb72b0ae 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3812,7 +3812,27 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
+	bool	group_unique_input = false;
 	RelOptInfo *partially_grouped_rel;
+	List	*groupExprs = NIL;
+
+	if (root->parse->groupingSets == NIL)
+	{
+		groupExprs  = get_sortgrouplist_exprs(parse->groupClause,
+														parse->targetList);
+		/*
+		 * If the groupby clauses is unique already,  groupping node is not necessary
+		 * if there is no aggregation functions.
+		 */
+		group_unique_input = relation_has_uniquekeys_for(root, input_rel,
+														 groupExprs, false);
+		if (group_unique_input &&
+			groupExprs != NIL &&
+			!parse->hasAggs &&
+			!parse->hasWindowFuncs &&
+			parse->havingQual == NULL)
+			return input_rel;
+	}
 
 	/*
 	 * Create grouping relation to hold fully aggregated grouping and/or
@@ -4739,6 +4759,12 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List	   *distinctExprs =  get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	/* If the result is unique already, we just return the input_rel directly */
+	if (relation_has_uniquekeys_for(root, input_rel, distinctExprs, false))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4776,10 +4802,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 477fd1205c..f4c4a21bc8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1027,14 +1027,12 @@ explain (costs off)
   select distinct max(unique2) from tenk1;
                              QUERY PLAN                              
 ---------------------------------------------------------------------
- HashAggregate
-   Group Key: $0
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Index Only Scan Backward using tenk1_unique2 on tenk1
                  Index Cond: (unique2 IS NOT NULL)
-   ->  Result
-(7 rows)
+(5 rows)
 
 select distinct max(unique2) from tenk1;
  max  
@@ -1193,7 +1191,7 @@ explain (costs off)
   select distinct min(f1), max(f1) from minmaxtest;
                                          QUERY PLAN                                          
 ---------------------------------------------------------------------------------------------
- Unique
+ Result
    InitPlan 1 (returns $0)
      ->  Limit
            ->  Merge Append
@@ -1216,10 +1214,7 @@ explain (costs off)
                  ->  Index Only Scan using minmaxtest2i on minmaxtest2 minmaxtest_8
                        Index Cond: (f1 IS NOT NULL)
                  ->  Index Only Scan Backward using minmaxtest3i on minmaxtest3 minmaxtest_9
-   ->  Sort
-         Sort Key: ($0), ($1)
-         ->  Result
-(26 rows)
+(23 rows)
 
 select distinct min(f1), max(f1) from minmaxtest;
  min | max 
@@ -1249,12 +1244,10 @@ create temp table t2 (x int, y int, z int, primary key (x, y));
 create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
 -- Non-primary-key columns can be removed from GROUP BY
 explain (costs off) select * from t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 -- No removal can happen if the complete PK is not present in GROUP BY
 explain (costs off) select a,c from t1 group by a,c,d;
@@ -1266,29 +1259,27 @@ explain (costs off) select a,c from t1 group by a,c,d;
 (3 rows)
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
-                      QUERY PLAN                      
-------------------------------------------------------
- HashAggregate
-   Group Key: t1.a, t1.b, t2.x, t2.y
-   ->  Hash Join
-         Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
-         ->  Seq Scan on t2
-         ->  Hash
-               ->  Seq Scan on t1
-(7 rows)
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
+                   QUERY PLAN                   
+------------------------------------------------
+ Hash Join
+   Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
+   ->  Seq Scan on t2
+   ->  Hash
+         ->  Seq Scan on t1
+(5 rows)
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
                       QUERY PLAN                      
 ------------------------------------------------------
  HashAggregate
    Group Key: t1.a, t1.b, t2.x, t2.z
-   ->  Hash Join
+   ->  Hash Left Join
          Hash Cond: ((t2.x = t1.a) AND (t2.y = t1.b))
          ->  Seq Scan on t2
          ->  Hash
@@ -1318,12 +1309,10 @@ explain (costs off) select * from t1 group by a,b,c,d;
 
 -- Okay to remove columns if we're only querying the parent.
 explain (costs off) select * from only t1 group by a,b,c,d;
-      QUERY PLAN      
-----------------------
- HashAggregate
-   Group Key: a, b
-   ->  Seq Scan on t1
-(3 rows)
+   QUERY PLAN   
+----------------
+ Seq Scan on t1
+(1 row)
 
 create temp table p_t1 (
   a int,
@@ -1336,14 +1325,12 @@ create temp table p_t1_1 partition of p_t1 for values in(1);
 create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;
-           QUERY PLAN           
---------------------------------
- HashAggregate
-   Group Key: p_t1.a, p_t1.b
-   ->  Append
-         ->  Seq Scan on p_t1_1
-         ->  Seq Scan on p_t1_2
-(5 rows)
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p_t1_1
+   ->  Seq Scan on p_t1_2
+(3 rows)
 
 drop table t1 cascade;
 NOTICE:  drop cascades to table t1c
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..09dcf15835 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,391 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+          QUERY PLAN          
+------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: c, d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+           QUERY PLAN            
+---------------------------------
+ Hash Join
+   Hash Cond: (uqk1.a = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+   ->  Hash
+         ->  Seq Scan on uqk2
+(6 rows)
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+                 QUERY PLAN                  
+---------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.pk
+         ->  Hash Right Join
+               Hash Cond: (uqk1.a = uqk2.pk)
+               ->  Seq Scan on uqk1
+               ->  Hash
+                     ->  Seq Scan on uqk2
+(8 rows)
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+ pk 
+----
+  1
+   
+(2 rows)
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+           QUERY PLAN            
+---------------------------------
+ Sort
+   Sort Key: uqk1.c, uqk1.d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+ c | d 
+---+---
+ 1 | 1
+ 2 | 2
+ 3 | 3
+(3 rows)
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+          QUERY PLAN          
+------------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+   ->  Materialize
+         ->  Seq Scan on uqk2
+(4 rows)
+
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+ a | pk | c | d | a | pk | c | d 
+---+----+---+---+---+----+---+---
+ 1 |  1 | 1 | 1 | 1 |  1 | 1 | 1
+ 1 |  1 | 1 | 1 | 4 |  4 | 4 | 4
+ 1 |  1 | 1 | 1 | 5 |  5 | 5 | 5
+ 2 |  2 | 2 | 2 | 1 |  1 | 1 | 1
+ 2 |  2 | 2 | 2 | 4 |  4 | 4 | 4
+ 2 |  2 | 2 | 2 | 5 |  5 | 5 | 5
+ 3 |  3 | 3 | 3 | 1 |  1 | 1 | 1
+ 3 |  3 | 3 | 3 | 4 |  4 | 4 | 4
+ 3 |  3 | 3 | 3 | 5 |  5 | 5 | 5
+(9 rows)
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+           QUERY PLAN           
+--------------------------------
+ Hash Semi Join
+   Hash Cond: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+   ->  Hash
+         ->  Seq Scan on uqk2
+(5 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+             QUERY PLAN             
+------------------------------------
+ Seq Scan on uqk1
+   Filter: (NOT (hashed SubPlan 1))
+   SubPlan 1
+     ->  Seq Scan on uqk2
+(4 rows)
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+            QUERY PLAN             
+-----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.pk)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+(5 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+            QUERY PLAN            
+----------------------------------
+ Nested Loop
+   Join Filter: (uqk1.d = uqk2.d)
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(6 rows)
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+ d 
+---
+(0 rows)
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+        QUERY PLAN        
+--------------------------
+ Nested Loop
+   ->  Seq Scan on uqk1
+         Filter: (pk = 2)
+   ->  Seq Scan on uqk2
+         Filter: (pk = 1)
+(5 rows)
+
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+ c | c 
+---+---
+ 2 | 1
+(1 row)
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+              QUERY PLAN               
+---------------------------------------
+ Hash Join
+   Hash Cond: ((sum(uqk2.pk)) = t1.pk)
+   ->  HashAggregate
+         Group Key: uqk2.a, uqk2.c
+         ->  Seq Scan on uqk2
+   ->  Hash
+         ->  Seq Scan on uqk1 t1
+(7 rows)
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP INDEX dist_p_uk_b_c;
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ HashAggregate
+   Group Key: dist_p.a, dist_p.b, dist_p.c, dist_p.d
+   ->  Append
+         ->  Seq Scan on dist_p0 dist_p_1
+         ->  Seq Scan on dist_p1 dist_p_2
+(5 rows)
+
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+             QUERY PLAN             
+------------------------------------
+ Append
+   ->  Seq Scan on dist_p0 dist_p_1
+   ->  Seq Scan on dist_p1 dist_p_2
+(3 rows)
+
+DROP TABLE dist_p;
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p2(d);
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+ANALYZE p;
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+              QUERY PLAN              
+--------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: p.d
+         ->  Append
+               ->  Seq Scan on p1 p_1
+               ->  Seq Scan on p2 p_2
+(6 rows)
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+    QUERY PLAN     
+-------------------
+ Seq Scan on p1 p
+   Filter: (a = 1)
+(2 rows)
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+    QUERY PLAN     
+-------------------
+ Seq Scan on p1 p
+   Filter: (a = 1)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+    QUERY PLAN     
+-------------------
+ Seq Scan on p1 p
+   Filter: (a = 1)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+        QUERY PLAN        
+--------------------------
+ Append
+   ->  Seq Scan on p1 p_1
+         Filter: (c = 1)
+   ->  Seq Scan on p2 p_2
+         Filter: (c = 1)
+(5 rows)
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+              QUERY PLAN               
+---------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: uqk1.c, uqk1.d
+         ->  Seq Scan on uqk1
+               Filter: (c IS NOT NULL)
+(5 rows)
+
+DEALLOCATE pt;
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+         QUERY PLAN         
+----------------------------
+ Seq Scan on x
+   Filter: (x.* = '(1)'::x)
+(2 rows)
+
+DROP TABLE x;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 54f5cf7ecc..f4d7f7057f 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -427,13 +427,13 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select a,c from t1 group by a,c,d;
 
 -- Test removal across multiple relations
-explain (costs off) select *
+explain (costs off) select t2.*
 from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
-group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
+group by t1.a,t1.c,t1.d,t2.x,t2.y,t2.z;
 
 -- Test case where t1 can be optimized but not t2
 explain (costs off) select t1.*,t2.x,t2.z
-from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
+from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
 group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
 
 -- Cannot optimize when PK is deferrable
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..14bf36d908 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,149 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, c int,  d int);
+INSERT INTO uqk1 VALUES(1, 1, 1, 1), (2, 2, 2, 2), (3, 3, 3, 3);
+INSERT INTO uqk2 VALUES(1, 1, 1, 1), (4, 4, 4, 4), (5, 5, 5, 5);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+
+-- Test join
+-- both uqk1 (c, d) and uqk2(pk) are unique key, so distinct is not needed.
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1, uqk2
+WHERE uqk1.c is NOT NULL AND uqk1.a = uqk2.pk;
+
+-- Distinct is needed since the outer join allows multi null values.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+order BY 1;
+
+SELECT DISTINCT uqk1.pk FROM uqk1 RIGHT JOIN uqk2 ON (uqk1.a = uqk2.pk) order BY 1;
+
+-- Distinct is not needed since uqk1 is the left table in outer join
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+SELECT DISTINCT uqk1.c, uqk1.d FROM uqk1 LEFT JOIN uqk2 ON (uqk1.a = uqk2.pk)
+WHERE uqk1.c is NOT NULL order BY 1, 2;
+
+-- The combination of UniqueKey is still unique no mather the join clause
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1, uqk2;
+SELECT DISTINCT * FROM uqk1, uqk2 order BY 1, 2, 3, 4, 5, 6;
+
+-- Test Semi/Anti JOIN
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d in (SELECT d FROM uqk2);
+EXPLAIN (COSTS OFF) SELECT DISTINCT pk FROM uqk1 WHERE d NOT in (SELECT d FROM uqk2);
+
+-- Test Unique Key FOR one-row case, DISTINCT is NOT needed as well.
+-- uqk1.d is the a uniquekey due to onerow rule. uqk2.pk is pk
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk;
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk1.d = uqk2.pk order BY 1;
+
+-- Both uqk1.d AND uqk2.c are the a uniquekey due to onerow rule
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2
+AND uqk2.pk = 1 AND uqk1.d = uqk2.d ;
+
+SELECT DISTINCT uqk1.d FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1
+AND uqk1.d = uqk2.d order BY 1;
+
+-- Both UniqueKey in targetList, so distinct is not needed.
+EXPLAIN (COSTS OFF) SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1;
+SELECT DISTINCT uqk1.c, uqk2.c FROM uqk1, uqk2 WHERE uqk1.pk = 2 AND uqk2.pk = 1 order BY 1, 2;
+
+-- Test SubQuery
+-- t2(a, c) is UNIQUE key because of group by
+EXPLAIN (COSTS OFF) SELECT DISTINCT t2.a, t2.c FROM uqk1 t1 inner JOIN
+(SELECT a, c, sum(pk) as t2b FROM uqk2 group BY a, c) t2
+ON (t2.t2b = t1.pk);
+
+-- Test partitioned TABLE
+CREATE TABLE dist_p (a int, b int NOT NULL, c int NOT NULL, d int) PARTITION BY RANGE (b);
+CREATE TABLE dist_p0 PARTITION OF dist_p FOR VALUES FROM (1) to (10);
+CREATE TABLE dist_p1 PARTITION OF dist_p FOR VALUES FROM (11) to (20);
+
+-- create unique index on dist_p
+CREATE UNIQUE INDEX dist_p_uk_b_c ON dist_p(b, c);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+DROP INDEX dist_p_uk_b_c;
+
+-- we also support create unqiue index on each child tables
+CREATE UNIQUE INDEX dist_p0_uk_bc ON dist_p0(b, c);
+-- NOT ok, since dist_p1 have no such INDEX
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+CREATE UNIQUE INDEX dist_p1_uk_bc ON dist_p1(b, c);
+-- OK now
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM dist_p;
+
+DROP TABLE dist_p;
+
+-- Test partition with different order.
+CREATE TABLE p (a INT NOT NULL, b DATE NOT NULL,c INT NOT NULL, d VARCHAR(5) NOT NULL) PARTITION BY LIST (a);
+
+-- First partition with different attr order.
+CREATE TABLE p1(d VARCHAR(5) NOT NULL, c INT NOT NULL, b DATE NOT NULL, a INT NOT NULL);
+ALTER TABLE p ATTACH PARTITION p1 FOR VALUES IN (1);
+
+-- 2nd partition
+CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
+
+-- Unique Index locally without partition key.
+CREATE UNIQUE index p1_unq_d ON p1(d);
+CREATE UNIQUE index p2_unq_d ON p2(d);
+
+-- Unique Index with
+CREATE UNIQUE index p_unq_ac ON  p(a, c);
+
+ANALYZE p;
+
+-- d is unique for its owner partition only.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM P;
+
+-- d is unique for its owner partition, but this query only talk to one partition.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM p WHERE a = 1;
+
+-- (a,c) in unique locally, but constains the partition key.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c FROM p WHERE a = 1;
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM p WHERE c = 1;
+
+DROP TABLE p;
+-- Test some VIEW
+CREATE VIEW distinct_v1 as SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM distinct_v1;
+
+-- Test generic plan
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+prepare pt as SELECT * FROM distinct_v1;
+EXPLAIN (COSTS OFF)  execute pt;
+ALTER TABLE uqk1 ALTER COLUMN d DROP NOT NULL;
+EXPLAIN (COSTS OFF) execute pt;
+DEALLOCATE pt;
+
+DROP VIEW distinct_v1;
+DROP TABLE uqk1;
+DROP TABLE uqk2;
+
+-- Test Whole Row Not NULL cases
+CREATE TABLE x (a int UNIQUE);
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM x WHERE x = '(1)'::x;
+DROP TABLE x;
-- 
2.21.0

v11-0006-Join-removal-at-run-time-with-UniqueKey.patchapplication/octet-stream; name=v11-0006-Join-removal-at-run-time-with-UniqueKey.patchDownload

From f27115d044b5f6babb722b45d7809954f2c76bbf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Thu, 7 May 2020 08:36:29 +0800
Subject: [PATCH v11 6/6] Join removal at run-time with UniqueKey.

We add another join removal during build_join_rel to use the
benefits of UniqueKey remove_useless_join.  However this new
strategy can't be an replacement of the current one since we just
knows 2 relation at that time. so it can't handle something like

SELECT a.* FROM a LEFT JOIN (b LEFT JOIN c ON b.c_id = c.id)
ON (a.b_id = b.id);

However it can handle more cases than the current strategy like

SELECT  t1.a
FROM m3 t1
LEFT JOIN (SELECT m1.a FROM m1, m2 WHERE m1.b = m2.a) t2;

As for the implementation, it is a very PoC version. The main
idea is if the join_canbe_removed, we still need a joinrel with
the information of outerrel only. However I have to do many works
to get there.

1. If the innerrel can be removed, we don't need to build pathlist for joinrel,
   we just reuse the pathlist from outerrel. However there are many places where
   use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
   have to change the parent to joinrel.
2. During create plan for some path on RTE_RELATION, it needs to know the
   relation Oid with path->parent->relid. so we have to use the outerrel->relid
   to overwrite the joinrel->relid which is 0 before.
3. Almost same paths as item 2, it usually assert best_path->parent->rtekind ==
   RTE_RELATION; now the path may appear in joinrel, so I used
   outerrel->rtekind to overwrite joinrel->rtekind.
4. I guess there are some dependencies between path->pathtarget and
   rel->reltarget. since we reuse the pathlist of outerrel, so I used the
   outer->reltarget as well. If the join can be removed, I guess the length of
   list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
   rely on the ProjectionPath to reduce the tlist.
---
 src/backend/optimizer/path/joinrels.c |   8 +-
 src/backend/optimizer/util/relnode.c  | 263 +++++++++++++++++++++++++-
 src/include/optimizer/pathnode.h      |   4 +-
 src/test/regress/expected/join.out    |  39 ++++
 src/test/regress/sql/join.sql         |  26 +++
 5 files changed, 333 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index b9163ee8ff..7e393c09f9 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -691,6 +691,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	SpecialJoinInfo sjinfo_data;
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
+	bool	innerrel_removed = false;
 
 	/* We should never try to join two overlapping sets of rels. */
 	Assert(!bms_overlap(rel1->relids, rel2->relids));
@@ -744,7 +745,7 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 	 * goes with this particular joining.
 	 */
 	joinrel = build_join_rel(root, joinrelids, rel1, rel2, sjinfo,
-							 &restrictlist);
+							 &restrictlist, &innerrel_removed);
 
 	/*
 	 * If we've already proven this join is empty, we needn't consider any
@@ -756,9 +757,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 		return joinrel;
 	}
 
+	if (!innerrel_removed)
 	/* Add paths to the join relation. */
-	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
-								restrictlist);
+		populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
+									restrictlist);
 
 	bms_free(joinrelids);
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a203e6f1ff..f488d87b1c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -22,6 +22,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/inherit.h"
+#include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
@@ -73,6 +74,11 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
 
+static bool join_canbe_removed(PlannerInfo *root,
+							   SpecialJoinInfo *sjinfo,
+							   RelOptInfo *joinrel,
+							   RelOptInfo *innerrel,
+							   List *restrictlist);
 
 /*
  * setup_simple_rel_arrays
@@ -579,7 +585,8 @@ build_join_rel(PlannerInfo *root,
 			   RelOptInfo *outer_rel,
 			   RelOptInfo *inner_rel,
 			   SpecialJoinInfo *sjinfo,
-			   List **restrictlist_ptr)
+			   List **restrictlist_ptr,
+			   bool *innerrel_removed)
 {
 	RelOptInfo *joinrel;
 	List	   *restrictlist;
@@ -718,6 +725,64 @@ build_join_rel(PlannerInfo *root,
 	 */
 	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
 
+	if (join_canbe_removed(root, sjinfo,
+						   joinrel, inner_rel,
+						   restrictlist))
+	{
+		ListCell *lc;
+
+		joinrel->rows = outer_rel->rows;
+		joinrel->consider_startup = outer_rel->consider_param_startup;
+		joinrel->consider_param_startup = outer_rel->consider_param_startup;
+		joinrel->consider_parallel = outer_rel->consider_parallel;
+
+		/* Rely on the projection path to reduce the tlist. */
+		joinrel->reltarget = outer_rel->reltarget;
+
+		joinrel->direct_lateral_relids = outer_rel->direct_lateral_relids;
+		joinrel->lateral_relids = outer_rel->lateral_relids;
+
+		joinrel->unique_for_rels = outer_rel->unique_for_rels;
+		joinrel->non_unique_for_rels = outer_rel->non_unique_for_rels;
+		joinrel->baserestrictinfo = outer_rel->baserestrictinfo;
+		joinrel->baserestrictcost = outer_rel->baserestrictcost;
+		joinrel->baserestrict_min_security = outer_rel->baserestrict_min_security;
+		joinrel->uniquekeys = outer_rel->uniquekeys;
+		joinrel->consider_partitionwise_join = outer_rel->consider_partitionwise_join;
+		joinrel->top_parent_relids = outer_rel->top_parent_relids;
+
+		/* Some scan path need to know which base relation to scan, it uses the relid
+		 * field, so we have to use the outerrel->relid.
+		 */
+		joinrel->relid = outer_rel->relid;
+
+		/* Almost the same paths as above, it assert the rte_kind is RTE_RELATION, so
+		 * we need to set as same as outerrel as well
+		 */
+		joinrel->rtekind = RTE_RELATION;
+
+		/* Make sure the path->parent point to current joinrel, can't update it in-place. */
+		foreach(lc, outer_rel->pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_path(joinrel, path);
+		}
+
+		foreach(lc, joinrel->partial_pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_partial_path(joinrel, path);
+		}
+		*innerrel_removed = true;
+	}
+	else
+	{
 	/* Store the partition information. */
 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
 								 sjinfo->jointype);
@@ -746,7 +811,7 @@ build_join_rel(PlannerInfo *root,
 		is_parallel_safe(root, (Node *) restrictlist) &&
 		is_parallel_safe(root, (Node *) joinrel->reltarget->exprs))
 		joinrel->consider_parallel = true;
-
+	}
 	/* Add the joinrel to the PlannerInfo. */
 	add_join_rel(root, joinrel);
 
@@ -759,11 +824,18 @@ build_join_rel(PlannerInfo *root,
 	if (root->join_rel_level)
 	{
 		Assert(root->join_cur_level > 0);
-		Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
+	   // Assert(root->join_cur_level <= bms_num_members(joinrel->relids));
 		root->join_rel_level[root->join_cur_level] =
 			lappend(root->join_rel_level[root->join_cur_level], joinrel);
 	}
 
+	/* elog(INFO, "lev-%d Build JoinRel (%s) with %s and %s, inner is removed: %d", */
+	/*	 root->join_cur_level, */
+	/*	 bmsToString(joinrelids), */
+	/*	 bmsToString(outer_rel->relids), */
+	/*	 bmsToString(inner_rel->relids), */
+	/*	 joinrel->removed); */
+
 	return joinrel;
 }
 
@@ -2027,3 +2099,188 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+static bool
+join_canbe_removed(PlannerInfo *root,
+				   SpecialJoinInfo *sjinfo,
+				   RelOptInfo *joinrel,
+				   RelOptInfo *innerrel,
+				   List *restrictlist)
+{
+	Bitmapset	*vars;
+	List	*exprs = NIL;
+	ListCell	*lc;
+	Bitmapset	*tmp;
+	bool	res;
+
+	if (sjinfo->jointype != JOIN_LEFT)
+		return false;
+
+	if (innerrel->uniquekeys == NIL)
+		return false;
+
+	/*
+	 * Check if there is any innerrel's cols can't be removed.
+	 */
+
+	vars = pull_varnos((Node*)joinrel->reltarget->exprs);
+	tmp = bms_intersect(vars, innerrel->relids);
+	if (!bms_is_empty(tmp))
+		return false;
+
+	foreach(lc, restrictlist)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		if (rinfo->can_join)
+		{
+			if (rinfo->mergeopfamilies != NIL)
+			{
+				if (bms_is_subset(rinfo->left_relids, innerrel->relids))
+					exprs = lappend(exprs, get_leftop(rinfo->clause));
+				else if (bms_is_subset(rinfo->right_relids, innerrel->relids))
+					exprs = lappend(exprs, get_rightop(rinfo->clause));
+				else
+					Assert(false);
+			}
+			else
+				/* Not mergeable join clause, we have to keep it */
+				return false;
+		}
+		else
+		{
+			/*
+			 * If the rinfo is not joinable clause, and it is not pushed down to
+			 * baserelation's basicrestrictinfo. so it must be in ON clauses.
+			 * Example: SELECT .. FROM t1 left join t2 on t1.a = 10;
+			 * In this case we can't remove the inner join as well.
+			 */
+			return false;
+		}
+	}
+	res =  relation_has_uniquekeys_for(root, innerrel, exprs, true);
+	return res;
+}
+
+
+size_t
+size_of_path(Path *path)
+{
+	switch(path->type)
+	{
+		case T_Path:
+			return sizeof(Path);
+		case T_IndexPath:
+			return sizeof(IndexPath);
+		case T_BitmapHeapPath:
+			return sizeof(BitmapHeapPath);
+		case T_TidPath:
+			return sizeof(TidPath);
+		case T_SubqueryScanPath:
+			return sizeof(SubqueryScanPath);
+		case T_ForeignPath:
+			return sizeof(ForeignPath);
+		case T_CustomPath:
+			return sizeof(CustomPath);
+
+
+		case T_NestPath:
+			return sizeof(NestPath);
+
+
+		case T_MergePath:
+			return sizeof(MergePath);
+
+
+		case T_HashPath:
+			return sizeof(HashPath);
+
+
+		case T_AppendPath:
+			return sizeof(AppendPath);
+
+
+		case T_MergeAppendPath:
+			return sizeof(MergeAppendPath);
+
+
+		case T_GroupResultPath:
+			return sizeof(GroupResultPath);
+
+
+		case T_MaterialPath:
+			return sizeof(MaterialPath);
+
+
+		case T_UniquePath:
+			return sizeof(UniquePath);
+
+
+		case T_GatherPath:
+			return sizeof(GatherPath);
+
+
+		case T_GatherMergePath:
+			return sizeof(GatherMergePath);
+
+
+		case T_ProjectionPath:
+			return sizeof(ProjectionPath);
+
+
+		case T_ProjectSetPath:
+			return sizeof(ProjectSetPath);
+
+
+		case T_SortPath:
+			return sizeof(SortPath);
+
+
+		case T_IncrementalSortPath:
+			return sizeof(IncrementalSortPath);
+
+
+		case T_GroupPath:
+			return sizeof(GroupPath);
+
+
+		case T_UpperUniquePath:
+			return sizeof(UpperUniquePath);
+
+
+		case T_AggPath:
+			return sizeof(AggPath);
+
+
+		case T_GroupingSetsPath:
+			return sizeof(GroupingSetsPath);
+
+
+		case T_MinMaxAggPath:
+			return sizeof(MinMaxAggPath);
+
+
+		case T_WindowAggPath:
+			return sizeof(WindowAggPath);
+
+
+		case T_SetOpPath:
+			return sizeof(SetOpPath);
+
+
+		case T_RecursiveUnionPath:
+			return sizeof(RecursiveUnionPath);
+
+
+		case T_LockRowsPath:
+			return sizeof(LockRowsPath);
+		case T_ModifyTablePath:
+			return sizeof(ModifyTablePath);
+		case T_LimitPath:
+			return sizeof(LimitPath);
+		default:
+			elog(ERROR, "unrecognized path type: %s",
+				 nodeToString(&path->type));
+			break;
+	}
+	return 0;
+}
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 715a24ad29..254961b2b4 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -294,7 +294,8 @@ extern RelOptInfo *build_join_rel(PlannerInfo *root,
 								  RelOptInfo *outer_rel,
 								  RelOptInfo *inner_rel,
 								  SpecialJoinInfo *sjinfo,
-								  List **restrictlist_ptr);
+								  List **restrictlist_ptr,
+								  bool *innerrel_removed);
 extern Relids min_join_parameterization(PlannerInfo *root,
 										Relids joinrelids,
 										RelOptInfo *outer_rel,
@@ -321,4 +322,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *parent_joinrel, List *restrictlist,
 										SpecialJoinInfo *sjinfo, JoinType jointype);
 
+extern size_t size_of_path(Path *path);
 #endif							/* PATHNODE_H */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8378936eda..b59c9a73ae 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6283,3 +6283,42 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+        QUERY PLAN        
+--------------------------
+ Seq Scan on public.m3 t1
+   Output: t1.a
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+         QUERY PLAN         
+----------------------------
+ Seq Scan on public.m1
+   Output: m1.a, m1.b, m1.c
+(2 rows)
+
+drop table m1;
+drop table m2;
+drop table m3;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3312542411..317354547d 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2169,3 +2169,29 @@ where exists (select 1 from j3
       and t1.unique1 < 1;
 
 drop table j3;
+
+create table m1 (a int primary key,  b int, c int);
+create table m2 (a int primary key,  b int, c int);
+create table m3 (a int primary key,  b int, c int);
+
+explain (verbose, costs off)
+select  t1.a
+from m3 t1
+left join (select m1.a from m1, m2 where m1.b = m2.a) t2
+on (t1.a = t2.a);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on (m1.a = m2.a)
+and m1.b in (select b from m3);
+
+explain (verbose, costs off)
+select m1.*
+from m1 left join m2
+on m1.b = m2.a
+and m2.b in (select b from m3);
+
+drop table m1;
+drop table m2;
+drop table m3;
-- 
2.21.0

#56

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: Andy Fan (#55)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, Sep 09, 2020 at 07:51:12AM +0800, Andy Fan wrote:

Thank you Michael for checking it, I can reproduce the same locally after
rebasing to the latest master. The attached v11 has fixed it and includes
the fix Floris found.

The status of this patch is we are still in discussion about which data
type should
UniqueKey->expr use. Both David [1] and I [2] shared some thinking about
EquivalenceClasses, but neither of us have decided on it. So I still didn't
change
anything about that now. I can change it once we have decided on it.

[1]
/messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com

[2]
/messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com

Hi,

In the Index Skip Scan thread Peter mentioned couple of issues that I
believe need to be addressed here. In fact one about valgrind errors was
already fixed as far as I see (nkeycolumns instead of ncolumns), another
one was:

/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:
In function ‘populate_baserel_uniquekeys’:
/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13:
warning: ‘expr’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
797 | else if (!list_member(unique_index->rel->reltarget->exprs, expr))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Other than that I wanted to ask what are the plans to proceed with this
patch? It's been a while since the question was raised in which format
to keep unique key expressions, and as far as I can see no detailed
suggestions or patch changes were proposed as a follow up. Obviously I
would love to see the first two preparation patches committed to avoid
dependencies between patches, and want to suggest an incremental
approach with simple format for start (what we have right now) with the
idea how to extend it in the future to cover more cases.

#57

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#56)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Wed, Oct 7, 2020 at 9:55 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Wed, Sep 09, 2020 at 07:51:12AM +0800, Andy Fan wrote:

Thank you Michael for checking it, I can reproduce the same locally after
rebasing to the latest master. The attached v11 has fixed it and includes
the fix Floris found.

The status of this patch is we are still in discussion about which data
type should
UniqueKey->expr use. Both David [1] and I [2] shared some thinking about
EquivalenceClasses, but neither of us have decided on it. So I still

didn't

change
anything about that now. I can change it once we have decided on it.

[1]

/messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com

[2]

/messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com

Hi,

In the Index Skip Scan thread Peter mentioned couple of issues that I
believe need to be addressed here. In fact one about valgrind errors was
already fixed as far as I see (nkeycolumns instead of ncolumns), another
one was:

/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:
In function ‘populate_baserel_uniquekeys’:

/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13:
warning: ‘expr’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
797 | else if (!list_member(unique_index->rel->reltarget->exprs, expr))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I can fix this warning in the next version, thanks for reporting it. It
can be
fixed like below or just adjust the if-elseif-else pattern.

--- a/src/backend/optimizer/path/uniquekeys.c
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -760,6 +760,7 @@ get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
                {
                        /* Index on system column is not supported */
                        Assert(false);
+                       expr = NULL; /* make compiler happy */
                }

Other than that I wanted to ask what are the plans to proceed with this
patch? It's been a while since the question was raised in which format
to keep unique key expressions, and as far as I can see no detailed
suggestions or patch changes were proposed as a follow up. Obviously I
would love to see the first two preparation patches committed to avoid
dependencies between patches, and want to suggest an incremental
approach with simple format for start (what we have right now) with the
idea how to extend it in the future to cover more cases.

I think the hardest part of this series is commit 2, it probably needs
lots of
dedicated time to review which would be the hardest part for the reviewers.
I don't have a good suggestion, however.

--
Best Regards
Andy Fan

#58

Hou, Zhijie

houzj.fnst@cn.fujitsu.com

over 5 years ago

In reply to: Andy Fan (#57)

RE: [PATCH] Keeps tracking the uniqueness with UniqueKey

I have a look over this patch and find some typos in 0002.

1.Some typos about unique:
There are some spelling mistakes about "unique" in code comments and README.
Such as: "+However we define the UnqiueKey as below."

2.function name about initililze_uniquecontext_for_joinrel:
May be it should be initialize_ uniquecontext_for_joinrel.

3.some typos in comment:
+ * baserelation's basicrestrictinfo. so it must be in ON clauses.

I think it shoule be " basicrestrictinfo " => "baserestrictinfo".

Besides, I think list_copy can be used to simplify the following code.
(But It seems the type of expr is still in discussion, so this may has no impact )
+	List	*exprs = NIL;
...
+	foreach(lc, unionrel->reltarget->exprs)
+	{
+		exprs = lappend(exprs, lfirst(lc));
+	}

Best regards,

#59

Dmitry Dolgov

9erthalion6@gmail.com

over 5 years ago

In reply to: Andy Fan (#57)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, Oct 08, 2020 at 09:34:51AM +0800, Andy Fan wrote:

Other than that I wanted to ask what are the plans to proceed with this
patch? It's been a while since the question was raised in which format
to keep unique key expressions, and as far as I can see no detailed
suggestions or patch changes were proposed as a follow up. Obviously I
would love to see the first two preparation patches committed to avoid
dependencies between patches, and want to suggest an incremental
approach with simple format for start (what we have right now) with the
idea how to extend it in the future to cover more cases.

I think the hardest part of this series is commit 2, it probably needs
lots of
dedicated time to review which would be the hardest part for the reviewers.
I don't have a good suggestion, however.

Sure, and I would review the patch as well. But as far as I understand
the main issue is "how to store uniquekey expressions", and as long as
it is not decided, no additional review will move the patch forward I
guess.

#60

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Dmitry Dolgov (#59)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, Oct 8, 2020 at 6:39 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Thu, Oct 08, 2020 at 09:34:51AM +0800, Andy Fan wrote:

Other than that I wanted to ask what are the plans to proceed with this
patch? It's been a while since the question was raised in which format
to keep unique key expressions, and as far as I can see no detailed
suggestions or patch changes were proposed as a follow up. Obviously I
would love to see the first two preparation patches committed to avoid
dependencies between patches, and want to suggest an incremental
approach with simple format for start (what we have right now) with the
idea how to extend it in the future to cover more cases.

I think the hardest part of this series is commit 2, it probably needs
lots of
dedicated time to review which would be the hardest part for the

reviewers.

I don't have a good suggestion, however.

Sure, and I would review the patch as well.

Thank you very much!

But as far as I understand
the main issue is "how to store uniquekey expressions", and as long as
it is not decided, no additional review will move the patch forward I
guess.

I don't think so:) The patch may have other issues as well. For example,
logic error or duplicated code or cases needing improvement and so on.

--
Best Regards
Andy Fan

#61

Andy Fan

zhihui.fan1213@gmail.com

over 5 years ago

In reply to: Hou, Zhijie (#58)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Thu, Oct 8, 2020 at 12:12 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
wrote:

Hi

I have a look over this patch and find some typos in 0002.

1.Some typos about unique:
There are some spelling mistakes about "unique" in code comments and
README.
Such as: "+However we define the UnqiueKey as below."

2.function name about initililze_uniquecontext_for_joinrel:
May be it should be initialize_ uniquecontext_for_joinrel.

3.some typos in comment:
+ * baserelation's basicrestrictinfo. so it must be
in ON clauses.

I think it shoule be " basicrestrictinfo " => "baserestrictinfo".
Besides, I think list_copy can be used to simplify the following code.
(But It seems the type of expr is still in discussion, so this may has no
impact )
+       List    *exprs = NIL;
...
+       foreach(lc, unionrel->reltarget->exprs)
+       {
+               exprs = lappend(exprs, lfirst(lc));
+       }
Best regards,

Thank you zhijie, I will fix them in next version.

--
Best Regards
Andy Fan

#62

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Andy Fan (#61)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

This patch has stopped moving for a while, any suggestion about
how to move on is appreciated.

--
Best Regards
Andy Fan

#63

Heikki Linnakangas

hlinnaka@iki.fi

about 5 years ago

In reply to: Andy Fan (#62)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On 26/11/2020 16:58, Andy Fan wrote:

This patch has stopped moving for a while, any suggestion about
how to move on is appreciated.

The question on whether UniqueKey.exprs should be a list of
EquivalenceClasses or PathKeys is unresolved. I don't have an opinion on
that, but I'd suggest that you pick one or the other and just go with
it. If it turns out to be a bad choice, then we'll change it.

Quickly looking at the patches, there's one thing I think no one's
mentioned yet, but looks really ugly to me:

+		/* Make sure the path->parent point to current joinrel, can't update it in-place. */
+		foreach(lc, outer_rel->pathlist)
+		{
+			Size sz = size_of_path(lfirst(lc));
+			Path *path = palloc(sz);
+			memcpy(path, lfirst(lc), sz);
+			path->parent = joinrel;
+			add_path(joinrel, path);
+		}

Copying a Path and modifying it like that is not good, there's got to be
a better way to do this. Perhaps wrap the original Paths in
ProjectionPaths, where the ProjectionPath's parent is the joinrel and
dummypp=true.

- Heikki

#64

Jesper Pedersen

jesper.pedersen@redhat.com

about 5 years ago

In reply to: Heikki Linnakangas (#63)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi,

On 11/30/20 5:04 AM, Heikki Linnakangas wrote:

On 26/11/2020 16:58, Andy Fan wrote:

This patch has stopped moving for a while, any suggestion about
how to move on is appreciated.

The question on whether UniqueKey.exprs should be a list of
EquivalenceClasses or PathKeys is unresolved. I don't have an opinion
on that, but I'd suggest that you pick one or the other and just go
with it. If it turns out to be a bad choice, then we'll change it.

In this case I think it is matter of deciding if we are going to use
EquivalenceClasses or Exprs before going further; there has been work
ongoing in this area for a while, so having a clear direction from a
committer would be greatly appreciated.

Deciding would also help potential reviewers to give more feedback on
the features implemented on top of the base.

Should there be a new thread with the minimum requirements in order to
get closer ?

Best regards,
Jesper

#65

Heikki Linnakangas

hlinnaka@iki.fi

about 5 years ago

In reply to: Jesper Pedersen (#64)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On 30/11/2020 16:30, Jesper Pedersen wrote:

On 11/30/20 5:04 AM, Heikki Linnakangas wrote:

On 26/11/2020 16:58, Andy Fan wrote:

This patch has stopped moving for a while, any suggestion about
how to move on is appreciated.

The question on whether UniqueKey.exprs should be a list of
EquivalenceClasses or PathKeys is unresolved. I don't have an opinion
on that, but I'd suggest that you pick one or the other and just go
with it. If it turns out to be a bad choice, then we'll change it.

In this case I think it is matter of deciding if we are going to use
EquivalenceClasses or Exprs before going further; there has been work
ongoing in this area for a while, so having a clear direction from a
committer would be greatly appreciated.

Plain Exprs are not good enough, because you need to know which operator
the expression is unique on. Usually, it's the default = operator in the
default btree opclass for the datatype, but it could be something else, too.

There's some precedence for PathKeys, as we generate PathKeys to
represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the
other hand, I've always found it confusing that we use PathKeys to
represent DISTINCT and GROUP BY, which are not actually sort orderings.
Perhaps it would make sense to store EquivalenceClass+opfamily in
UniqueKey, and also replace distinct_pathkeys and group_pathkeys with
UniqueKeys.

That's just my 2 cents though, others more familiar with this planner
code might have other opinions...

- Heikki

#66

Hou, Zhijie

houzj.fnst@cn.fujitsu.com

about 5 years ago

In reply to: Heikki Linnakangas (#65)

RE: [PATCH] Keeps tracking the uniqueness with UniqueKey

I look into the patch again and have some comments.

1.
+	Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns;
+
+	return ind1->ncolumns == ind2->ncolumns &&
+		ind1->unique == ind2->unique &&
+		memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 &&
+		memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 &&
+		memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 &&
+		memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 &&
+		equal(get_tlist_exprs(ind1->indextlist, true),
+			  get_tlist_exprs(ind2->indextlist, true));

The length of sortopfamily,opfamily and opcintype seems ->nkeycolumns not ->ncolumns.
I checked function get_relation_info where init the IndexOptInfo.
(If there are more places where can change the length, please correct me)

+	COPY_SCALAR_FIELD(ncolumns);
+	COPY_SCALAR_FIELD(nkeycolumns);
+	COPY_SCALAR_FIELD(unique);
+	COPY_SCALAR_FIELD(immediate);
+	/* We just need to know if it is NIL or not */
+	COPY_SCALAR_FIELD(indpred);
+	COPY_SCALAR_FIELD(predOK);
+	COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int));
+	COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid));
+	COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid));
+	COPY_NODE_FIELD(indextlist);

The same as 1.
Should use nkeycolumns if I am right.

3.
+	foreach(lc, newnode->indextlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		/* Index on expression is ignored */
+		Assert(IsA(tle->expr, Var));
+		tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr);
+		newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno;
+		idx++;
+	}

The count variable 'idx' can be replaces by foreach_current_index().

Best regards,
houzj

#67

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Heikki Linnakangas (#65)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Thank you Heikki for your attention.

On Mon, Nov 30, 2020 at 11:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 30/11/2020 16:30, Jesper Pedersen wrote:

On 11/30/20 5:04 AM, Heikki Linnakangas wrote:

On 26/11/2020 16:58, Andy Fan wrote:

This patch has stopped moving for a while, any suggestion about
how to move on is appreciated.

The question on whether UniqueKey.exprs should be a list of
EquivalenceClasses or PathKeys is unresolved. I don't have an opinion
on that, but I'd suggest that you pick one or the other and just go
with it. If it turns out to be a bad choice, then we'll change it.

In this case I think it is matter of deciding if we are going to use
EquivalenceClasses or Exprs before going further; there has been work
ongoing in this area for a while, so having a clear direction from a
committer would be greatly appreciated.

Plain Exprs are not good enough, because you need to know which operator
the expression is unique on. Usually, it's the default = operator in the
default btree opclass for the datatype, but it could be something else,
too.

Actually I can't understand this, could you explain more? Based on my
current
knowledge, when we run "SELECT DISTINCT a FROM t", we never care about
which operator to use for the unique.

There's some precedence for PathKeys, as we generate PathKeys to

represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the
other hand, I've always found it confusing that we use PathKeys to
represent DISTINCT and GROUP BY, which are not actually sort orderings.

OK, I have the same confusion now:)

Perhaps it would make sense to store EquivalenceClass+opfamily in

UniqueKey, and also replace distinct_pathkeys and group_pathkeys with
UniqueKeys.

I can understand why we need EquivalenceClass for UniqueKey, but I can't
understand why we need opfamily here.

For anyone who is interested with these patchsets, here is my plan about
this
now. 1). I will try EquivalenceClass rather than Expr in UniqueKey and
add opfamily
if needed. 2). I will start a new thread to continue this topic. The
current thread is too long
which may scare some people who may have interest in it. 3). I will give up
patch 5 & 6
for now. one reason I am not happy with the current implementation, and
the other
reason is I want to make the patchset smaller to make the reviewer easier.
I will not
give up them forever, after the main part of this patchset is committed, I
will continue
with them in a new thread.

Thanks everyone for your input.

--
Best Regards
Andy Fan

#68

Heikki Linnakangas

hlinnaka@iki.fi

about 5 years ago

In reply to: Andy Fan (#67)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On 05/12/2020 17:10, Andy Fan wrote:

Actually I can't understand this, could you explain more? Based on my
current
knowledge, when we run "SELECT DISTINCT a FROM t", we never care about
which operator to use for the unique.

SortGroupClause includes 'eqop' field, which determines the operator
that the expression needs to made unique with. The syntax doesn't let
you set it to anything else than the default btree opclass of the
datatype, though. But you can specify it for ORDER BY, and we use
SortGroupClauses to represent both sorting and grouping.

Also, if you use the same struct to also represent columns that you know
to be unique, and not just the DISTINCT clause in the query, then you
need the operator. For example, if you create a unique index on
non-default opfamily.

There's some precedence for PathKeys, as we generate PathKeys to
represent the DISTINCT column in PlannerInfo->distinct_pathkeys. On the
other hand, I've always found it confusing that we use PathKeys to
represent DISTINCT and GROUP BY, which are not actually sort orderings.

OK, I have the same confusion now:)

Perhaps it would make sense to store EquivalenceClass+opfamily in
UniqueKey, and also replace distinct_pathkeys and group_pathkeys with
UniqueKeys.

I can understand why we need EquivalenceClass for UniqueKey, but I can't
understand why we need opfamily here.

Thinking a bit harder, I guess we don't. Because EquivalenceClass
includes the operator family already, in the ec_opfamilies field.

For anyone who is interested with these patchsets, here is my plan
about this now. 1). I will try EquivalenceClass rather than Expr in
UniqueKey and add opfamily if needed. 2). I will start a new thread
to continue this topic. The current thread is too long which may
scare some people who may have interest in it. 3). I will give up
patch 5 & 6 for now. one reason I am not happy with the current
implementation, and the other reason is I want to make the patchset
smaller to make the reviewer easier. I will not give up them forever,
after the main part of this patchset is committed, I will continue
with them in a new thread. Thanks everyone for your input.

Sounds like a plan.

- Heikki

#69

Tom Lane

tgl@sss.pgh.pa.us

about 5 years ago

In reply to: Heikki Linnakangas (#68)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Heikki Linnakangas <hlinnaka@iki.fi> writes:

I can understand why we need EquivalenceClass for UniqueKey, but I can't
understand why we need opfamily here.

Thinking a bit harder, I guess we don't. Because EquivalenceClass
includes the operator family already, in the ec_opfamilies field.

No. EquivalenceClasses only care about equality, which is why they
might potentially mention several opfamilies that share an equality
operator. If you care about sort order, you *cannot* rely on an
EquivalenceClass to depict that. Now, abstract uniqueness also only
cares about equality, but if you are going to implement it via sort-
and-unique then you need to settle on a sort order.

I agree we are overspecifying DISTINCT by settling on a sort operator at
parse time, rather than considering all the possibilities at plan time.
But given that opfamilies sharing equality are mostly a hypothetical
use-case, I'm not in a big hurry to fix it. Before we had ASC/DESC
indexes, there was a real use-case for making a "reverse sort" opclass,
with the same equality as the type's regular opclass but the opposite sort
order. But that's ancient history now, and I've seen few other plausible
use-cases.

I have not been following this thread closely enough to understand
why we need a new "UniqueKeys" data structure at all. But if the
motivation is only to remove this overspecification, I humbly suggest
that it ain't worth the trouble.

regards, tom lane

#70

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Tom Lane (#69)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Thank you Tom and Heikki for your input.

On Sun, Dec 6, 2020 at 4:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

I can understand why we need EquivalenceClass for UniqueKey, but I can't
understand why we need opfamily here.

Thinking a bit harder, I guess we don't. Because EquivalenceClass
includes the operator family already, in the ec_opfamilies field.

No. EquivalenceClasses only care about equality, which is why they
might potentially mention several opfamilies that share an equality
operator. If you care about sort order, you *cannot* rely on an
EquivalenceClass to depict that. Now, abstract uniqueness also only
cares about equality, but if you are going to implement it via sort-
and-unique then you need to settle on a sort order.

I think UniqueKey only cares about equality. Even DISTINCT / groupBy
can be implemented with sort, but UniqueKey only care about the result
of DISTINCT/GROUPBY, so it doesn't matter IIUC.

I agree we are overspecifying DISTINCT by settling on a sort operator at
parse time, rather than considering all the possibilities at plan time.
But given that opfamilies sharing equality are mostly a hypothetical
use-case, I'm not in a big hurry to fix it. Before we had ASC/DESC
indexes, there was a real use-case for making a "reverse sort" opclass,
with the same equality as the type's regular opclass but the opposite sort
order. But that's ancient history now, and I've seen few other plausible
use-cases.

I have not been following this thread closely enough to understand
why we need a new "UniqueKeys" data structure at all.

Currently the UniqueKey is defined as a List of Expr, rather than
EquivalenceClasses.
A complete discussion until now can be found at [1]/messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com (The messages I replied
to also
care a lot and the information is completed). This patch has stopped at
this place for
a while, I'm planning to try EquivalenceClasses, but any suggestion would
be welcome.

But if the
motivation is only to remove this overspecification, I humbly suggest
that it ain't worth the trouble.

regards, tom lane

[1]: /messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com
/messages/by-id/CAKU4AWqy3Uv67=PR8RXG6LVoO-cMEwfW_LMwTxHdGrnu+cf+dA@mail.gmail.com

--
Best Regards
Andy Fan

#71

Jesper Pedersen

jesper.pedersen@redhat.com

about 5 years ago

In reply to: Andy Fan (#70)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi,

On 12/5/20 10:38 PM, Andy Fan wrote:

Currently the UniqueKey is defined as a List of Expr, rather than
EquivalenceClasses.
A complete discussion until now can be found at [1] (The messages I replied
to also
care a lot and the information is completed). This patch has stopped at
this place for
a while, I'm planning to try EquivalenceClasses, but any suggestion would
be welcome.

Unfortunately I think we need a RfC style patch of both versions in
their minimum implementation.

Hopefully this will make it easier for one or more committers to decide
on the right direction since they can do a side-by-side comparison of
the two solutions.

Just my $0.02.

Thanks for working on this Andy !

Best regards,
Jesper

#72

Andy Fan

zhihui.fan1213@gmail.com

about 5 years ago

In reply to: Jesper Pedersen (#71)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Mon, Dec 7, 2020 at 4:16 PM Jesper Pedersen <jesper.pedersen@redhat.com>
wrote:

Hi,

On 12/5/20 10:38 PM, Andy Fan wrote:

Currently the UniqueKey is defined as a List of Expr, rather than
EquivalenceClasses.
A complete discussion until now can be found at [1] (The messages I

replied

to also
care a lot and the information is completed). This patch has stopped at
this place for
a while, I'm planning to try EquivalenceClasses, but any suggestion

would

be welcome.

Unfortunately I think we need a RfC style patch of both versions in
their minimum implementation.

Hopefully this will make it easier for one or more committers to decide
on the right direction since they can do a side-by-side comparison of
the two solutions.

I do get the exact same idea. Actually I have made EquivalenceClasses
works with baserel last weekend and then I realized it is hard to compare
the 2 situations without looking into the real/Poc code, even for very
experienced people. I will submit a new patch after I get the partitioned
relation, subquery works. Hope I can make it in one week.

Just my $0.02.

Thanks for working on this Andy !

Best regards,
Jesper

--
Best Regards
Andy Fan

#73

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

about 5 years ago

In reply to: Andy Fan (#70)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sun, Dec 6, 2020 at 9:09 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

I have not been following this thread closely enough to understand
why we need a new "UniqueKeys" data structure at all.

Currently the UniqueKey is defined as a List of Expr, rather than EquivalenceClasses.
A complete discussion until now can be found at [1] (The messages I replied to also
care a lot and the information is completed). This patch has stopped at this place for
a while, I'm planning to try EquivalenceClasses, but any suggestion would be welcome.

But if the
motivation is only to remove this overspecification, I humbly suggest
that it ain't worth the trouble.

AFAIK, the simple answer is we need some way to tell that certain
expressions together form a unique key for a given relation. E.g.
group by clause forms a unique key for the output of GROUP BY.
Pathkeys have a stronger requirement that the relation is ordered on
that expression, which may not be the case with uniqueness e.g. output
of GROUP BY produced by hash grouping. To me it's Pathkeys - ordering,
so we could use Pathkeys with reduced strength. But that might affect
a lot of places which depend upon stronger pathkeys.

--
Best Wishes,
Ashutosh Bapat

#74

David Rowley

dgrowleyml@gmail.com

about 5 years ago

In reply to: Andy Fan (#67)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sun, 6 Dec 2020 at 04:10, Andy Fan <zhihui.fan1213@gmail.com> wrote:

For anyone who is interested with these patchsets, here is my plan about this
now. 1). I will try EquivalenceClass rather than Expr in UniqueKey and add opfamily
if needed.

I agree that we should be storing them in EquivalenceClasses. Apart
from what was mentioned already it also allow the optimisation to work
in cases like:

create table t (a int not null unique, b int);
select distinct b from t where a = b;

David

#75

Masahiko Sawada

sawada.mshk@gmail.com

almost 5 years ago

In reply to: Andy Fan (#72)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Andy,

On Mon, Dec 7, 2020 at 9:15 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Dec 7, 2020 at 4:16 PM Jesper Pedersen <jesper.pedersen@redhat.com> wrote:

Hi,

On 12/5/20 10:38 PM, Andy Fan wrote:

Currently the UniqueKey is defined as a List of Expr, rather than
EquivalenceClasses.
A complete discussion until now can be found at [1] (The messages I replied
to also
care a lot and the information is completed). This patch has stopped at
this place for
a while, I'm planning to try EquivalenceClasses, but any suggestion would
be welcome.

Unfortunately I think we need a RfC style patch of both versions in
their minimum implementation.

Hopefully this will make it easier for one or more committers to decide
on the right direction since they can do a side-by-side comparison of
the two solutions.

I do get the exact same idea. Actually I have made EquivalenceClasses
works with baserel last weekend and then I realized it is hard to compare
the 2 situations without looking into the real/Poc code, even for very
experienced people. I will submit a new patch after I get the partitioned
relation, subquery works. Hope I can make it in one week.

Status update for a commitfest entry.

Are you planning to submit a new patch? Or is there any blocker for
this work? This patch entry on CF app has been in state Waiting on
Author for a while. If there is any update on that, please reflect on
CF app.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#76

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Masahiko Sawada (#75)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hi Masahiko:

On Fri, Jan 22, 2021 at 9:15 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Hi Andy,

On Mon, Dec 7, 2020 at 9:15 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Dec 7, 2020 at 4:16 PM Jesper Pedersen <

jesper.pedersen@redhat.com> wrote:

Hi,

On 12/5/20 10:38 PM, Andy Fan wrote:

Currently the UniqueKey is defined as a List of Expr, rather than
EquivalenceClasses.
A complete discussion until now can be found at [1] (The messages I

replied

to also
care a lot and the information is completed). This patch has stopped

at

this place for
a while, I'm planning to try EquivalenceClasses, but any suggestion

would

be welcome.

Unfortunately I think we need a RfC style patch of both versions in
their minimum implementation.

Hopefully this will make it easier for one or more committers to decide
on the right direction since they can do a side-by-side comparison of
the two solutions.

I do get the exact same idea. Actually I have made EquivalenceClasses
works with baserel last weekend and then I realized it is hard to compare
the 2 situations without looking into the real/Poc code, even for very
experienced people. I will submit a new patch after I get the

partitioned

relation, subquery works. Hope I can make it in one week.

Status update for a commitfest entry.

Are you planning to submit a new patch? Or is there any blocker for
this work? This patch entry on CF app has been in state Waiting on
Author for a while. If there is any update on that, please reflect on
CF app.

I agree that the current status is "Waiting on author", and no block

issue for others.
I plan to work on this in 1 month. I have to get my current urgent case
completed first.
Sorry for the delay action and thanks for asking.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#77

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Andy Fan (#76)

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

On Sun, Jan 24, 2021 at 6:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi Masahiko:

On Fri, Jan 22, 2021 at 9:15 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Hi Andy,

On Mon, Dec 7, 2020 at 9:15 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Dec 7, 2020 at 4:16 PM Jesper Pedersen <

jesper.pedersen@redhat.com> wrote:

Hi,

On 12/5/20 10:38 PM, Andy Fan wrote:

Currently the UniqueKey is defined as a List of Expr, rather than
EquivalenceClasses.
A complete discussion until now can be found at [1] (The messages I

replied

to also
care a lot and the information is completed). This patch has stopped

at

this place for
a while, I'm planning to try EquivalenceClasses, but any

suggestion would

be welcome.

Unfortunately I think we need a RfC style patch of both versions in
their minimum implementation.

Hopefully this will make it easier for one or more committers to decide
on the right direction since they can do a side-by-side comparison of
the two solutions.

I do get the exact same idea. Actually I have made EquivalenceClasses
works with baserel last weekend and then I realized it is hard to

compare

the 2 situations without looking into the real/Poc code, even for very
experienced people. I will submit a new patch after I get the

partitioned

relation, subquery works. Hope I can make it in one week.

Status update for a commitfest entry.

Are you planning to submit a new patch? Or is there any blocker for
this work? This patch entry on CF app has been in state Waiting on
Author for a while. If there is any update on that, please reflect on
CF app.

I agree that the current status is "Waiting on author", and no block

issue for others.
I plan to work on this in 1 month. I have to get my current urgent case
completed first.
Sorry for the delay action and thanks for asking.

I'd start to continue this work today. At the same time, I will split the
multi-patch series
into some dedicated small chunks for easier review. The first one is just
for adding a
notnullattrs in RelOptInfo struct, in thread [1].

/messages/by-id/CAKU4AWpQjAqJwQ2X-aR9g3+ZHRzU1k8hNP7A+_mLuOv-n5aVKA@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)