[PATCH] query rewrite for distinct stage under some cases

Started by Andy Fanalmost 6 years ago1 messages

Andy Fan

zhihui.fan1213@gmail.com

almost 6 years ago

1 attachment(s)

Hi Hackers:

This is a patch for unique elimination rewrite for distinct query.

it will cost much for a big result set and some times it is not

necessary. The basic idea is the unique node like in the following

can be eliminated.

1. select distinct pk, ... from t;

2. select distinct uk-col1, uk-col2, ...

from t where uk-col1 is not null and uk-col2 is not null;

3. select distinct a, b .... from t group by a, b;

4. select distinct t1.pk, t2.pk, ... from t1, t2.

The distinct keyword in above sql is obviously redundant,

But a SQL may join a lot of tables with tens of columns in target

list and a number of indexes. Finally the sql is hidden in

hundreds of SQL in system, it will be hard to find it out.

That's why I want the kernel can keep watching it,

based on that it will not cost too much. Oracle has similar rewrite

as well.

The rule for single relation is:

a). The primary key is choose in target list.

b). The unique key is choose in the target list,

and we can tell the result of the related column is not nullable.

we can tell it by catalog and qual.

c). The group-by columns is choose in target list.

d). The target list in subquery has a distinct already.

(select distinct xxx from (select distinct xxx from t2));

The rule for multi-relations join is:

e). if any relation yield a unique result, then the result of join will be

unique as well

If an sql matches any rule of above, we can remove the unique node.

Rule d) is not so common and complex to implement, so it is not

included in this patch.

Implementation:

f). I choose the target list per table, if there is hasDistinctOn, the
source

is the target list intersect distinctClause. or else, the source is
target list only.

g). the pk/uk columns information is gathered
by RelationGetIndexAttrBitmap.

a new filed RelationData.plain_uk_ukattrs is added and gathered as
well.

h). As last if any rule matches, Query->distinctClause &
Query->hasDistinctOn

will be cleared to avoid generating the related path.

There are also some fast paths to return earlier:

i). If a table in join-list, but no columns is choose in target list.

j). The join-list contains sub-query. (this rewrite happens after
sub-query pull-up)

k). Based on the cost of the checking, we check group by first and

then PK and then UK + not null.

There is no impact for non-distinct query, as for distinct query, this rule
will

increase the total cost a bit if the distinct can't be removed. The unique

check is most expensive, so here is the data to show the impact, a 4

columns table, no pk, 1 uk with 2 columns.

With this feature disable: avg plan time: 0.095ms

With this feature enabled: avg plan time: 0.102ms

Basically I think the cost would be ok.

Concurrency:

l). When we see a pk or uk index, so we remove the index on another
session,

I think this would be ok because of MVCC rules.

m). When we are creating an index in another session but it is not
completed,

suppose we can't get it with RelationGetIndexAttrBitmap. so it should be
ok

as well.

The behavior can be changed online with enable_unique_elimination,

it is true by default.

The patch is generated with the latest code on github,

and the current HEAD is 34a0a81bfb388504deaa51b16a8bb531b827e519.

The make installcheck-world & check-world has pass.

Test case join.sql and sysview.sql are impacted by this change

and they are expected, the changed expected.out file is included in this
patch.

Please let me know if you have any questions.

Thank you

Attachments:

0001-Avoid-the-distinct-stage-if-the-result-is-unique-alr.patchapplication/octet-stream; name=0001-Avoid-the-distinct-stage-if-the-result-is-unique-alr.patchDownload

From 11229f342dd2887f3c5a925b201392508b4ff516 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Mon, 20 Jan 2020 11:02:47 +0800
Subject: [PATCH] Avoid the distinct stage if the result is unique already

For a single relation, we can tell it by:
1. The pk is in the target list.
2. The uk is in the target list and the columns is not null
3. The columns in group-by clause is also in the target list

for relation join, we can tell it by:
1. if every relation it the jointree yeild a unique result set

The final result will be unique as well regrardless the join method
---
 src/backend/nodes/bitmapset.c                 |  38 +++
 src/backend/optimizer/path/costsize.c         |   1 +
 src/backend/optimizer/plan/planner.c          | 216 ++++++++++++++++++
 src/backend/utils/cache/relcache.c            |  24 +-
 src/backend/utils/misc/guc.c                  |  10 +
 src/include/nodes/bitmapset.h                 |   2 +
 src/include/optimizer/cost.h                  |   1 +
 src/include/utils/rel.h                       |   3 +
 src/test/regress/expected/join.out            |  16 +-
 .../regress/expected/select_distinct_2.out    |  88 +++++++
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/select_distinct_2.sql    |  31 +++
 12 files changed, 423 insertions(+), 10 deletions(-)
 create mode 100644 src/test/regress/expected/select_distinct_2.out
 create mode 100644 src/test/regress/sql/select_distinct_2.sql

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 648cc1a7eb..8fa3312e91 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -1167,3 +1167,41 @@ bms_hash_value(const Bitmapset *a)
 	return DatumGetUInt32(hash_any((const unsigned char *) a->words,
 								   (lastword + 1) * sizeof(bitmapword)));
 }
+
+/*
+ * bms_array_copy --  copy the bms data in the newly palloc allocated memory
+ */
+
+Bitmapset**
+bms_array_copy(Bitmapset **bms_array, int len)
+{
+	Bitmapset **res;
+	int i;
+	if (bms_array == NULL || len < 1)
+		return NULL;
+
+	res = palloc(sizeof(Bitmapset*) * len);
+	for(i = 0; i < len; i++)
+	{
+		res[i] = bms_copy(bms_array[i]);
+	}
+	return res;
+}
+
+/*
+ * bms_array_free
+ *
+ * free the element in the array one by one, last free the array as well at last
+ */
+void
+bms_array_free(Bitmapset **bms_array,  int len)
+{
+	int idx;
+	if (bms_array == NULL)
+		return;
+	for(idx = 0 ; idx < len; idx++)
+	{
+		bms_free(bms_array[idx]);
+	}
+	pfree(bms_array);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033721..3ac9aa5137 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -138,6 +138,7 @@ bool		enable_partitionwise_aggregate = false;
 bool		enable_parallel_append = true;
 bool		enable_parallel_hash = true;
 bool		enable_partition_pruning = true;
+bool		enable_unique_elimination = true;
 
 typedef struct
 {
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d6f2153593..71b9f2df15 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -22,8 +22,10 @@
 #include "access/htup_details.h"
 #include "access/parallel.h"
 #include "access/sysattr.h"
+#include "access/relation.h"
 #include "access/table.h"
 #include "access/xact.h"
+#include "catalog/index.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_proc.h"
@@ -35,6 +37,7 @@
 #include "lib/bipartite_match.h"
 #include "lib/knapsack.h"
 #include "miscadmin.h"
+#include "nodes/bitmapset.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #ifdef OPTIMIZER_DEBUG
@@ -248,6 +251,7 @@ static bool group_by_has_partkey(RelOptInfo *input_rel,
 								 List *targetList,
 								 List *groupClause);
 static int	common_prefix_cmp(const void *a, const void *b);
+static void	preprocess_unique_node(PlannerInfo *root);
 
 
 /*****************************************************************************
@@ -989,6 +993,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 	/* Remove any redundant GROUP BY columns */
 	remove_useless_groupby_columns(root);
 
+	if (enable_unique_elimination)
+		preprocess_unique_node(root);
+
 	/*
 	 * If we have any outer joins, try to reduce them to plain inner joins.
 	 * This step is most easily done after we've done expression
@@ -7409,3 +7416,212 @@ group_by_has_partkey(RelOptInfo *input_rel,
 
 	return true;
 }
+
+/*
+ * is_unique_result_already
+ *
+ * Given a relation, we can know its primary key + unique key information
+ * unique target is the target list of distinct/distinct on target.
+ * not_null_columns is a union of not null columns based on catalog and quals.
+ * then we can know the result is unique already before executing it if
+ * the primary key or uk + not null in target list.
+ */
+static bool
+is_unique_result_already(Relation relation,
+						 Bitmapset *unique_target,
+						 Bitmapset *not_null_columns)
+{
+	int i;
+	Bitmapset *pkattr = RelationGetIndexAttrBitmap(relation,
+												   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+	/*
+	 * if the pk is in the target list,
+	 * the result set is unique for this relation
+	 */
+	if (pkattr != NULL &&
+		!bms_is_empty(pkattr) &&
+		bms_is_subset(pkattr, unique_target))
+	{
+		return true;
+	}
+
+	/*
+	 * check if the pk is in the unique index
+	 */
+	for (i = 0; i < relation->rd_plain_ukcount; i++)
+	{
+		Bitmapset *ukattr = relation->rd_plain_ukattrs[i];
+		if (!bms_is_empty(ukattr)
+			&& bms_is_subset(ukattr, unique_target)
+			&& bms_is_subset(ukattr, not_null_columns))
+			return true;
+	}
+
+	/*
+	 * If a unique index is in the target list, and the columns are not null
+	 * the result set is unique as well
+	 */
+
+	return false;
+}
+
+/*
+ * preprocess_unique_node
+ *
+ * remove the distinctClause if it is not necessary
+ */
+static void
+preprocess_unique_node(PlannerInfo *root)
+{
+	Query *query = root->parse;
+	ListCell *lc;
+	int num_of_rtables;
+	Bitmapset **target_columns_by_table = NULL;
+    Bitmapset **notnullcolumns = NULL;
+	Index rel_idx;
+	bool should_unique_elimination = false;
+
+		if (query->distinctClause == NIL)
+		return;
+
+	foreach(lc, query->rtable)
+	{
+		RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+		if (rte->rtekind != RTE_RELATION)
+			return;
+	}
+
+	num_of_rtables = list_length(query->rtable);
+
+	/* if the group clause in the target list, we don't need distinct */
+	if (query->groupClause != NIL)
+	{
+		Bitmapset *groupclause_bitmap = NULL,  *groupclause_in_target_bitmap = NULL;
+		ListCell *lc;
+		foreach(lc, query->groupClause)
+			groupclause_bitmap = bms_add_member(groupclause_bitmap,
+												lfirst_node(SortGroupClause, lc)->tleSortGroupRef);
+
+		foreach(lc, query->targetList)
+		{
+			TargetEntry *te = lfirst_node(TargetEntry, lc);
+			if (te->resjunk)
+				continue;
+			groupclause_in_target_bitmap = bms_add_member(groupclause_in_target_bitmap,
+														  te->ressortgroupref);
+		}
+
+		should_unique_elimination = bms_is_subset(groupclause_bitmap,
+												  groupclause_in_target_bitmap);
+		bms_free(groupclause_bitmap);
+		bms_free(groupclause_in_target_bitmap);
+		if (should_unique_elimination)
+			goto ret;
+	}
+
+	target_columns_by_table = palloc0(sizeof(Bitmapset*) * num_of_rtables);
+	notnullcolumns = palloc0(sizeof(Bitmapset* ) * num_of_rtables);
+
+
+	/* build a unique targetlist bitmapset, handle distincton differently */
+	foreach(lc, query->targetList)
+	{
+		TargetEntry *te = lfirst_node(TargetEntry, lc);
+		Expr *expr = te->expr;
+		Var *var;
+		Bitmapset **target_column_per_rel;
+		int target_attno;
+
+		if (!IsA(expr, Var))
+			continue;
+		var = (Var *)(expr);
+		if (var->varlevelsup != 0)
+			continue;
+
+		target_column_per_rel = &target_columns_by_table[var->varno - 1];
+		target_attno = var->varattno - FirstLowInvalidHeapAttributeNumber;
+
+		if (query->hasDistinctOn)
+		{
+			Index ref = te->ressortgroupref;
+			ListCell *lc;
+
+			if (ref == 0)
+				continue;
+
+			foreach(lc, query->distinctClause)
+			{
+				if (ref == lfirst_node(SortGroupClause, lc)->tleSortGroupRef)
+					*target_column_per_rel = bms_add_member(*target_column_per_rel,
+															target_attno);
+			}
+		}
+		else
+			*target_column_per_rel = bms_add_member(*target_column_per_rel,
+												  target_attno);
+	}
+
+	/* find out nonnull columns from qual */
+	foreach(lc, find_nonnullable_vars(query->jointree->quals))
+	{
+		Var *not_null_var;
+		Bitmapset **notnullcolumns_per_rel;
+		int notnull_attno;
+		if (!IsA(lfirst(lc), Var))
+			continue;
+		not_null_var = lfirst_node(Var, lc);
+		if (not_null_var->varno == INNER_VAR ||
+			not_null_var->varno == OUTER_VAR ||
+			not_null_var->varno == INDEX_VAR)
+			continue;
+		notnullcolumns_per_rel = &notnullcolumns[not_null_var->varno - 1];
+		notnull_attno = not_null_var->varattno - FirstLowInvalidHeapAttributeNumber;
+		*notnullcolumns_per_rel = bms_add_member(*notnullcolumns_per_rel,
+												 notnull_attno);
+	}
+
+	/* Check if every related rtable can yield a unique result set */
+	rel_idx = 0;
+	foreach(lc, query->rtable)
+	{
+		RangeTblEntry *te = lfirst_node(RangeTblEntry, lc);
+	    Relation relation = relation_open(te->relid, RowExclusiveLock);
+		int attr_idx = 0;
+		TupleDesc desc = relation->rd_att;
+
+		/* check non-nullable in catalog */
+		for(; attr_idx < desc->natts; attr_idx++)
+		{
+			int notnull_attno;
+			if (!desc->attrs[attr_idx].attnotnull)
+				continue;
+			notnull_attno = attr_idx + 1 - FirstLowInvalidHeapAttributeNumber;
+			notnullcolumns[rel_idx] = bms_add_member(notnullcolumns[rel_idx],
+													 notnull_attno);
+		}
+
+		/* check non-nullable in qual, only col is not null checked now */
+		if (!is_unique_result_already(relation,
+									  target_columns_by_table[rel_idx],
+									  notnullcolumns[rel_idx]))
+		{
+			RelationClose(relation);
+			goto ret;
+		}
+		RelationClose(relation);
+		rel_idx++;
+	}
+
+	should_unique_elimination = true;
+
+ ret:
+	bms_array_free(notnullcolumns, num_of_rtables);
+	bms_array_free(target_columns_by_table, num_of_rtables);
+
+	if (should_unique_elimination)
+	{
+		query->distinctClause = NIL;
+		query->hasDistinctOn = false;
+	}
+}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index df025a5a30..0d2e320ce8 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2346,6 +2346,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
 	bms_free(relation->rd_keyattr);
 	bms_free(relation->rd_pkattr);
 	bms_free(relation->rd_idattr);
+	if (relation->rd_plain_ukattrs)
+		bms_array_free(relation->rd_plain_ukattrs, relation->rd_plain_ukcount);
 	if (relation->rd_pubactions)
 		pfree(relation->rd_pubactions);
 	if (relation->rd_options)
@@ -4762,6 +4764,7 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
 	Bitmapset  *indexattrs;		/* indexed columns */
 	Bitmapset  *uindexattrs;	/* columns in unique indexes */
 	Bitmapset  *pkindexattrs;	/* columns in the primary index */
+	Bitmapset  **ukindexattrs = NULL; /* columns in the unique indexes */
 	Bitmapset  *idindexattrs;	/* columns in the replica identity */
 	List	   *indexoidlist;
 	List	   *newindexoidlist;
@@ -4769,6 +4772,7 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
 	Oid			relreplindex;
 	ListCell   *l;
 	MemoryContext oldcxt;
+	int			plain_uk_index_count = 0, index_count = 0, indexno = 0;
 
 	/* Quick exit if we already computed the result. */
 	if (relation->rd_indexattr != NULL)
@@ -4826,6 +4830,9 @@ restart:
 	uindexattrs = NULL;
 	pkindexattrs = NULL;
 	idindexattrs = NULL;
+	index_count = list_length(indexoidlist);
+	ukindexattrs = palloc0(sizeof(Bitmapset *) * index_count);
+
 	foreach(l, indexoidlist)
 	{
 		Oid			indexOid = lfirst_oid(l);
@@ -4875,6 +4882,9 @@ restart:
 		/* Is this index the configured (or default) replica identity? */
 		isIDKey = (indexOid == relreplindex);
 
+		if (isKey)
+			plain_uk_index_count++;
+
 		/* Collect simple attribute references */
 		for (i = 0; i < indexDesc->rd_index->indnatts; i++)
 		{
@@ -4904,6 +4914,11 @@ restart:
 				if (isIDKey && i < indexDesc->rd_index->indnkeyatts)
 					idindexattrs = bms_add_member(idindexattrs,
 												  attrnum - FirstLowInvalidHeapAttributeNumber);
+
+				if (isKey)
+					ukindexattrs[indexno] = bms_add_member(ukindexattrs[indexno],
+														   attrnum - FirstLowInvalidHeapAttributeNumber);
+
 			}
 		}
 
@@ -4914,6 +4929,7 @@ restart:
 		pull_varattnos(indexPredicate, 1, &indexattrs);
 
 		index_close(indexDesc, AccessShareLock);
+		indexno++;
 	}
 
 	/*
@@ -4940,6 +4956,7 @@ restart:
 		bms_free(pkindexattrs);
 		bms_free(idindexattrs);
 		bms_free(indexattrs);
+		bms_array_free(ukindexattrs, index_count);
 
 		goto restart;
 	}
@@ -4953,7 +4970,8 @@ restart:
 	relation->rd_pkattr = NULL;
 	bms_free(relation->rd_idattr);
 	relation->rd_idattr = NULL;
-
+	bms_array_free(relation->rd_plain_ukattrs, relation->rd_plain_ukcount);
+	relation->rd_plain_ukattrs = NULL;
 	/*
 	 * Now save copies of the bitmaps in the relcache entry.  We intentionally
 	 * set rd_indexattr last, because that's the one that signals validity of
@@ -4966,6 +4984,8 @@ restart:
 	relation->rd_pkattr = bms_copy(pkindexattrs);
 	relation->rd_idattr = bms_copy(idindexattrs);
 	relation->rd_indexattr = bms_copy(indexattrs);
+	relation->rd_plain_ukattrs = bms_array_copy(ukindexattrs, index_count);
+	relation->rd_plain_ukcount = plain_uk_index_count;
 	MemoryContextSwitchTo(oldcxt);
 
 	/* We return our original working copy for caller to play with */
@@ -5618,6 +5638,8 @@ load_relcache_init_file(bool shared)
 		rel->rd_keyattr = NULL;
 		rel->rd_pkattr = NULL;
 		rel->rd_idattr = NULL;
+		rel->rd_plain_ukattrs = NULL;
+		rel->rd_plain_ukcount = 0;
 		rel->rd_pubactions = NULL;
 		rel->rd_statvalid = false;
 		rel->rd_statlist = NIL;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e44f71e991..1479ec0b5d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1064,6 +1064,16 @@ static struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_unique_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables plan-time and run-time unique elimination."),
+		    gettext_noop("Allows the query planner to remove the uncecessary distinct clause."), 
+			GUC_EXPLAIN
+		},
+		&enable_unique_elimination,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index b7b18a0b68..ff30feb521 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -117,4 +117,6 @@ extern int	bms_prev_member(const Bitmapset *a, int prevbit);
 /* support for hashtables using Bitmapsets as keys: */
 extern uint32 bms_hash_value(const Bitmapset *a);
 
+extern Bitmapset **bms_array_copy(Bitmapset **bms_array, int len);
+extern void bms_array_free(Bitmapset **bms_array,  int len);
 #endif							/* BITMAPSET_H */
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index cb012ba198..bb0353b46f 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -64,6 +64,7 @@ extern PGDLLIMPORT bool enable_partitionwise_aggregate;
 extern PGDLLIMPORT bool enable_parallel_append;
 extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
+extern PGDLLIMPORT bool enable_unique_elimination;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 44ed04dd3f..7c5a6d65b6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -120,6 +120,9 @@ typedef struct RelationData
 	Bitmapset  *rd_indexattr;	/* identifies columns used in indexes */
 	Bitmapset  *rd_keyattr;		/* cols that can be ref'd by foreign keys */
 	Bitmapset  *rd_pkattr;		/* cols included in primary key */
+	Bitmapset  **rd_plain_ukattrs;    /* cols included in the plain unique indexes,
+                                   only non-expression, non-partical columns are count */
+	int        rd_plain_ukcount;  /* the no. of uk count */
 	Bitmapset  *rd_idattr;		/* included in replica identity index */
 
 	PublicationActions *rd_pubactions;	/* publication actions */
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 761376b007..3f6595d53b 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4433,17 +4433,17 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
- Merge Right Join
-   Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+           QUERY PLAN            
+---------------------------------
+ Merge Left Join
+   Merge Cond: (d.a = s.id)
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
+   ->  Sort
+         Sort Key: s.id
+         ->  Subquery Scan on s
+               ->  Seq Scan on b
 (9 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
diff --git a/src/test/regress/expected/select_distinct_2.out b/src/test/regress/expected/select_distinct_2.out
new file mode 100644
index 0000000000..d5c8f818af
--- /dev/null
+++ b/src/test/regress/expected/select_distinct_2.out
@@ -0,0 +1,88 @@
+create table select_distinct_a(a int, b char(20),  c char(20) not null,  d int, e int, primary key(a, b));
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+-- no node for distinct.
+explain (costs off) select distinct * from select_distinct_a;
+          QUERY PLAN           
+-------------------------------
+ Seq Scan on select_distinct_a
+(1 row)
+
+explain (costs off) select distinct b,c,d,e from select_distinct_a;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: b, c, d, e
+   ->  Seq Scan on select_distinct_a
+(3 rows)
+
+create unique index select_distinct_a_uk on select_distinct_a(c, d);
+explain (costs off) select distinct b,c,d,e from select_distinct_a where c is not null;
+             QUERY PLAN              
+-------------------------------------
+ HashAggregate
+   Group Key: b, c, d, e
+   ->  Seq Scan on select_distinct_a
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+explain (costs off) select distinct b,c,d,e from select_distinct_a where c is not null and d is not null;
+                   QUERY PLAN                    
+-------------------------------------------------
+ Seq Scan on select_distinct_a
+   Filter: ((c IS NOT NULL) AND (d IS NOT NULL))
+(2 rows)
+
+explain select distinct d, e from select_distinct_a group by d, e;
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ HashAggregate  (cost=15.85..17.85 rows=200 width=8)
+   Group Key: d, e
+   ->  Seq Scan on select_distinct_a  (cost=0.00..13.90 rows=390 width=8)
+(3 rows)
+
+create table select_distinct_b(a int, b char(20),  c char(20) not null,  d int, e int, primary key(a, b));
+explain (costs off) select distinct * from select_distinct_a a, select_distinct_b b;
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_a a
+   ->  Materialize
+         ->  Seq Scan on select_distinct_b b
+(4 rows)
+
+explain (costs off) select distinct a.b, a.c, b.a, b.b from select_distinct_a a, select_distinct_b b;
+                       QUERY PLAN                        
+---------------------------------------------------------
+ Unique
+   ->  Sort
+         Sort Key: a.b, a.c, b.a, b.b
+         ->  Nested Loop
+               ->  Seq Scan on select_distinct_a a
+               ->  Materialize
+                     ->  Seq Scan on select_distinct_b b
+(7 rows)
+
+explain (costs off) select distinct a.d, a.c, b.a, b.b from select_distinct_a a, select_distinct_b b where a.d is not null;
+                 QUERY PLAN                  
+---------------------------------------------
+ Nested Loop
+   ->  Seq Scan on select_distinct_b b
+   ->  Materialize
+         ->  Seq Scan on select_distinct_a a
+               Filter: (d IS NOT NULL)
+(5 rows)
+
+explain (costs off) select distinct a.d, b.a from select_distinct_a a, select_distinct_b b group by a.d, b.a;
+                    QUERY PLAN                     
+---------------------------------------------------
+ HashAggregate
+   Group Key: a.d, b.a
+   ->  Nested Loop
+         ->  Seq Scan on select_distinct_a a
+         ->  Materialize
+               ->  Seq Scan on select_distinct_b b
+(6 rows)
+
+drop table select_distinct_a;
+drop table select_distinct_b;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index a1c90eb905..d297a119ac 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -89,7 +89,8 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(17 rows)
+ enable_unique_elimination      | on
+(18 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/sql/select_distinct_2.sql b/src/test/regress/sql/select_distinct_2.sql
new file mode 100644
index 0000000000..d236aa9168
--- /dev/null
+++ b/src/test/regress/sql/select_distinct_2.sql
@@ -0,0 +1,31 @@
+create table select_distinct_a(a int, b char(20),  c char(20) not null,  d int, e int, primary key(a, b));
+
+set enable_mergejoin to off;
+set enable_hashjoin to off;
+
+-- no node for distinct.
+explain (costs off) select distinct * from select_distinct_a;
+
+explain (costs off) select distinct b,c,d,e from select_distinct_a;
+
+create unique index select_distinct_a_uk on select_distinct_a(c, d);
+
+explain (costs off) select distinct b,c,d,e from select_distinct_a where c is not null;
+
+explain (costs off) select distinct b,c,d,e from select_distinct_a where c is not null and d is not null;
+
+explain select distinct d, e from select_distinct_a group by d, e;
+
+
+create table select_distinct_b(a int, b char(20),  c char(20) not null,  d int, e int, primary key(a, b));
+
+explain (costs off) select distinct * from select_distinct_a a, select_distinct_b b;
+
+explain (costs off) select distinct a.b, a.c, b.a, b.b from select_distinct_a a, select_distinct_b b;
+
+explain (costs off) select distinct a.d, a.c, b.a, b.b from select_distinct_a a, select_distinct_b b where a.d is not null;
+
+explain (costs off) select distinct a.d, b.a from select_distinct_a a, select_distinct_b b group by a.d, b.a;
+
+drop table select_distinct_a;
+drop table select_distinct_b;
-- 
2.20.1 (Apple Git-117)