UniqueKey on Partitioned table.

Started by Andy Fanalmost 5 years ago12 messages

zhihui.fan1213@gmail.com

almost 5 years ago

2 attachment(s)

Suppose we have partitioned table defined as below:

P(a, b, c, d) partition by (a)
p1 (a=1)
p2 (a=2)
p3 (a=3)

Since in PG, we can define different indexes among partitions, so each
child may
have different UniqueKeys, and some of them might be invalidated in parent's
level. For example, 1). we only define unique index p1(c), so (c) would be
an
uniquekey of p1 only. so it is invalidate on appendrel level. 2). We
define
unique index p_n(c) on each childrel, so every childrel has UniqueKey
(c). However it is invalid on appendrel as well. 3). We define unique index
p_n(a, c), since a is the partition key, so (a, c) would be valid for both
child rel and parent rel.

In my v1 implementation[1]/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com before, I maintained the child rel exactly the
same as
non-partitioned table. But when calculating the UniqueKey for partitioned
table, I first introduce a global_unique_indexlist which handles the above 3
cases. The indexes for case 1 and case 2 will not be in
global_unique_indexlist
but the index in case 3 will be even if they are only built in child level.
After
we have build the global_unique_indexlist on appendrel, we will build the
UnqiueKey exactly same as non partitioned table. With this way, I'm not
happy
with the above method now is because 1). the global_unique_indexlist is
build in
a hard way. 2). I have to totally ignored the UniqueKey on child level and
re-compute it on appendrel level. 3). The 3 cases should rarely happen in
real
life, I guess.

When I re-implement the UniqueKey with EquivalenceClass, I re-think about
how to
handle the above stuff. Now my preferred idea is just not handle it. When
building the
uniquekey on parent rel, we just handle 2 cases. If the appendrel only have
1
child, we just copy (and modified if needed due to col-order-mismatch case)
the
uniquekey. 2). Only handle the Unique index defined in top level, for this
case
it would yield the below situation.

create unique index on p(a, b); --> (A, B) will be the UniqueKey of p.
create unique index on p_nn(a, b); --> (A, B) will not be the UniqueKey of p
even we create it on ALL the child rel. The result is not perfect but I
think
it is practical. Any suggestions?

The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
single relation part and may have bugs. I just attached it here for design
review only. and the not-null-attrs is just v1 which we can continue
discussing on
the original thread[2]/messages/by-id/CAKU4AWpQjAqJwQ2X-aR9g3+ZHRzU1k8hNP7A+_mLuOv-n5aVKA@mail.gmail.com.

[1]: /messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
/messages/by-id/CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
[2]: /messages/by-id/CAKU4AWpQjAqJwQ2X-aR9g3+ZHRzU1k8hNP7A+_mLuOv-n5aVKA@mail.gmail.com
/messages/by-id/CAKU4AWpQjAqJwQ2X-aR9g3+ZHRzU1k8hNP7A+_mLuOv-n5aVKA@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachments:

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchapplication/octet-stream; name=v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patchDownload

From 30fd21df7361edeaea2f6078ad5139a53b14340d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Wed, 10 Feb 2021 10:58:06 +0800
Subject: [PATCH v1 1/2] Introduce notnullattrs field in RelOptInfo to indicate
 which attr

are not null in current query. The not null is judged by checking
pg_attribute and query's restrictinfo. The info is only maintained at
Base RelOptInfo and Partition's RelOptInfo level so far.
---
 src/backend/optimizer/path/allpaths.c  | 31 ++++++++++++++++++++++++++
 src/backend/optimizer/plan/initsplan.c | 10 +++++++++
 src/backend/optimizer/util/plancat.c   | 10 +++++++++
 src/include/nodes/pathnodes.h          |  2 ++
 4 files changed, 53 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..709d2f82ca 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -998,6 +998,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		RelOptInfo *childrel;
 		ListCell   *parentvars;
 		ListCell   *childvars;
+		int i = -1;
 
 		/* append_rel_list contains all append rels; ignore others */
 		if (appinfo->parent_relid != parentRTindex)
@@ -1054,6 +1055,36 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								   (Node *) rel->reltarget->exprs,
 								   1, &appinfo);
 
+		/* Copy notnullattrs. */
+		while ((i = bms_next_member(rel->notnullattrs, i)) > 0)
+		{
+			AttrNumber attno = i + FirstLowInvalidHeapAttributeNumber;
+			AttrNumber child_attno;
+			if (attno == 0)
+			{
+				/* Whole row is not null, so must be same for child */
+				childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+														attno - FirstLowInvalidHeapAttributeNumber);
+				break;
+			}
+			if (attno < 0 )
+				/* no need to translate system column */
+				child_attno = attno;
+			else
+			{
+				Node * node = list_nth(appinfo->translated_vars, attno - 1);
+				if (!IsA(node, Var))
+					/* This may happens at UNION case, like (SELECT a FROM t1 UNION SELECT a + 3
+					 * FROM t2) t and we know t.a is not null
+					 */
+					continue;
+				child_attno = castNode(Var, node)->varattno;
+			}
+
+			childrel->notnullattrs = bms_add_member(childrel->notnullattrs,
+													child_attno - FirstLowInvalidHeapAttributeNumber);
+		}
+
 		/*
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.  This is needed either if the parent
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 02f813cebd..d27167dc76 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -829,6 +829,16 @@ deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
 		{
 			Node	   *qual = (Node *) lfirst(l);
 
+			/* Set the not null info now */
+			ListCell	*lc;
+			List		*non_nullable_vars = find_nonnullable_vars(qual);
+			foreach(lc, non_nullable_vars)
+			{
+				Var *var = lfirst_node(Var, lc);
+				RelOptInfo *rel = root->simple_rel_array[var->varno];
+				rel->notnullattrs = bms_add_member(rel->notnullattrs,
+												   var->varattno - FirstLowInvalidHeapAttributeNumber);
+			}
 			distribute_qual_to_rels(root, qual,
 									below_outer_join, JOIN_INNER,
 									root->qual_security_level,
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 177e6e336a..0c5a79f296 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -117,6 +117,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	int			i;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -474,6 +475,15 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	if (inhparent && relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		set_relation_partition_info(root, rel, relation);
 
+	Assert(rel->notnullattrs == NULL);
+	for(i = 0; i < relation->rd_att->natts; i++)
+	{
+		FormData_pg_attribute attr = relation->rd_att->attrs[i];
+		if (attr.attnotnull)
+			rel->notnullattrs = bms_add_member(rel->notnullattrs,
+											   attr.attnum - FirstLowInvalidHeapAttributeNumber);
+	}
+
 	table_close(relation, NoLock);
 
 	/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ec93e648c..2d81f6216d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -710,6 +710,8 @@ typedef struct RelOptInfo
 	PlannerInfo *subroot;		/* if subquery */
 	List	   *subplan_params; /* if subquery */
 	int			rel_parallel_workers;	/* wanted number of parallel workers */
+	/* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+	Bitmapset		*notnullattrs;
 
 	/* Information about foreign tables and foreign joins */
 	Oid			serverid;		/* identifies server for the table or join */
-- 
2.21.0

v1-0002-UniqueKey-with-EquivalenceClass-for-single-rel-on.patchapplication/octet-stream; name=v1-0002-UniqueKey-with-EquivalenceClass-for-single-rel-on.patchDownload

From 7494fc8f9f9e48f6d19c5d6b7035fc297f413ac6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=80=E6=8C=83?= <yizhi.fzh@alibaba-inc.com>
Date: Sat, 20 Feb 2021 10:07:57 +0800
Subject: [PATCH v1 2/2] UniqueKey with EquivalenceClass for single rel only.

---
 src/backend/optimizer/path/Makefile           |   3 +-
 src/backend/optimizer/path/allpaths.c         |   2 +
 src/backend/optimizer/path/equivclass.c       |  18 +
 src/backend/optimizer/path/uniquekey.c        | 405 ++++++++++++++++++
 src/backend/optimizer/plan/planner.c          |   9 +-
 src/backend/optimizer/util/relnode.c          |  24 ++
 src/backend/optimizer/util/restrictinfo.c     |  24 ++
 src/include/nodes/nodes.h                     |   1 +
 src/include/nodes/pathnodes.h                 |   8 +-
 src/include/optimizer/paths.h                 |  10 +
 src/include/optimizer/restrictinfo.h          |   3 +
 src/test/regress/expected/join.out            |  11 +-
 src/test/regress/expected/select_distinct.out |  77 ++++
 src/test/regress/sql/select_distinct.sql      |  28 ++
 14 files changed, 610 insertions(+), 13 deletions(-)
 create mode 100644 src/backend/optimizer/path/uniquekey.c

diff --git a/src/backend/optimizer/path/Makefile b/src/backend/optimizer/path/Makefile
index 1e199ff66f..63cc1505d9 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	joinpath.o \
 	joinrels.o \
 	pathkeys.o \
-	tidpath.o
+	tidpath.o \
+	uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 709d2f82ca..376026ded3 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -579,6 +579,8 @@ set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	 */
 	check_index_predicates(root, rel);
 
+	populate_baserel_uniquekeys(root, rel, rel->indexlist);
+
 	/* Mark rel with estimated output rows, width, etc */
 	set_baserel_size_estimates(root, rel);
 }
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0188c1e9a1..263d60b5c3 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -798,6 +798,24 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	return NULL;
 }
 
+
+EquivalenceClass *
+find_ec_for_expr(PlannerInfo* root, Expr *expr, RelOptInfo *rel)
+{
+	int i = -1;
+	while ((i = bms_next_member(rel->eclass_indexes, i)) >= 0)
+	{
+		EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes, i);
+		ListCell	*lc;
+		foreach(lc, ec->ec_members)
+		{
+			EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
+			if (equal(expr, em->em_expr))
+				return ec;
+		}
+	}
+	return NULL;
+}
 /*
  * Find an equivalence class member expression that can be safely used to build
  * a sort node using the provided relation. The rules are a subset of those
diff --git a/src/backend/optimizer/path/uniquekey.c b/src/backend/optimizer/path/uniquekey.c
new file mode 100644
index 0000000000..41bdf22fc7
--- /dev/null
+++ b/src/backend/optimizer/path/uniquekey.c
@@ -0,0 +1,405 @@
+/*-------------------------------------------------------------------------
+ *
+ * uniquekeys.c
+ *	  Utilities for matching and building unique keys
+ *
+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/optimizer/path/uniquekeys.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "nodes/makefuncs.h"
+#include "nodes/nodeFuncs.h"
+#include "optimizer/pathnode.h"
+#include "optimizer/paths.h"
+#include "optimizer/appendinfo.h"
+#include "optimizer/optimizer.h"
+#include "optimizer/restrictinfo.h"
+#include "optimizer/tlist.h"
+#include "rewrite/rewriteManip.h"
+
+
+static List *make_simplified_unique_exprs(PlannerInfo *root,
+										  RelOptInfo *rel,
+										  List *unique_indexes,
+										  bool *matched_whole_index);
+static List *filter_useful_unique_exprs(RelOptInfo *rel,
+									   List *unique_exprs_list);
+static void make_uniquekey_from_exprs(RelOptInfo *rel,
+									  List *unique_exprs);
+static void make_usual_uniquekey(RelOptInfo *rel, List *exprs,
+								 bool multi_nullvals);
+static void make_onerow_uniquekey(RelOptInfo *rel);
+
+
+static List *get_usable_unique_indexes(List *indlist);
+static List *extract_const_opposite_exprs(List *restrict_list, List **opfamilies);
+
+/*
+ * populate_baserel_uniquekeys
+ *		Populate 'baserel' uniquekeys list by looking at the rel's unique index
+ * and baserestrictinfo.
+ */
+void
+populate_baserel_uniquekeys(PlannerInfo *root,
+							RelOptInfo *rel,
+							List *indexlist)
+{
+	List *unique_indexes = get_usable_unique_indexes(indexlist);
+	bool matched_awhole_uniqueindex = false;
+	List *unique_exprs_list;
+
+	if (unique_indexes == NIL)
+		return;
+
+	unique_exprs_list = make_simplified_unique_exprs(root, rel,
+													 unique_indexes,
+													 &matched_awhole_uniqueindex);
+
+	if (matched_awhole_uniqueindex)
+		make_onerow_uniquekey(rel);
+	else
+	{
+		ListCell	*lc;
+		foreach(lc, filter_useful_unique_exprs(rel, unique_exprs_list))
+		{
+			List	*unique_exprs = lfirst_node(List, lc);
+			make_uniquekey_from_exprs(rel, unique_exprs);
+		}
+	}
+}
+
+
+typedef struct UniqueExprsReduceContext
+{
+	IndexOptInfo	*indinfo;
+	List			*const_opposite_exprs;
+	List			*opfamilies;
+	List			*res;
+} UniqueExprsReduceContext;
+
+
+static bool
+simplified_unique_exprs_walker(Expr *index_expr,
+							   int i,
+							   UniqueExprsReduceContext *context)
+{
+	ListCell	*lc1, *lc2;
+	bool	found = false;
+	forboth(lc1, context->const_opposite_exprs, lc2, context->opfamilies)
+	{
+		Node	*expr = (Node *)lfirst(lc1);
+		List	*expr_families = lfirst(lc2);
+		if (list_member_oid(expr_families, context->indinfo->opfamily[i])
+			&& match_index_to_operand(expr, i, context->indinfo))
+			found = true;
+	}
+	if (!found)
+		context->res = lappend(context->res, index_expr);
+	return false;
+}
+
+
+/*
+ * make_simplified_unique_exprs
+ *
+ *	Build a list of exprs/EC from Unique Index, but before that, we would check
+ * if any expr is Const already in this query, if so, we would move it out
+ * of the exprs. If all the exprs in a unique index match with Consts, we would
+ * set *matched_whole_index to true.
+ */
+static List*
+make_simplified_unique_exprs(PlannerInfo *root, RelOptInfo *rel,
+							 List *unique_indexes,
+							 bool *matched_whole_index)
+{
+	List	*opfamilies = NIL, *res = NIL;
+	List	*clauses = extract_const_opposite_exprs(rel->baserestrictinfo, &opfamilies);
+	ListCell	*lc;
+
+	foreach(lc, unique_indexes)
+	{
+		List *exprs = NIL;
+		IndexOptInfo *indinfo = lfirst_node(IndexOptInfo, lc);
+		UniqueExprsReduceContext context = {indinfo, clauses, opfamilies, NIL};
+		index_keys_walker(indinfo, simplified_unique_exprs_walker, &context);
+		if (context.res == NIL)
+		{
+			*matched_whole_index = true;
+			return NIL;
+		}
+
+		foreach(lc, context.res)
+		{
+			Expr *expr = lfirst(lc);
+			EquivalenceClass *ec = find_ec_for_expr(root, expr, rel);
+			if (ec)
+			    exprs = lappend(exprs, ec);
+			else
+				exprs = lappend(exprs, expr);
+		}
+
+		res = lappend(res, exprs);
+	}
+	return res;
+}
+
+
+static List*
+get_usable_unique_indexes(List *indlist)
+{
+	ListCell *lc;
+	List *res = NIL;
+	foreach(lc, indlist)
+	{
+		IndexOptInfo *ind = (IndexOptInfo *) lfirst(lc);
+		if (!ind->unique || !ind->immediate ||
+			(ind->indpred != NIL && !ind->predOK))
+			continue;
+		res = lappend(res, ind);
+	}
+	return res;
+}
+
+
+
+/*
+ * make_usual_uniquekey
+ */
+static void
+make_usual_uniquekey(RelOptInfo *rel, List *exprs, bool multi_nullvals)
+{
+	UniqueKey * ukey = makeNode(UniqueKey);
+	ukey->exprs = exprs;
+	ukey->multi_nullvals = multi_nullvals;
+	rel->uniquekeys = lappend(rel->uniquekeys, ukey);
+}
+
+static void
+make_onerow_uniquekey(RelOptInfo *rel)
+{
+	UniqueKey *ukey = makeNode(UniqueKey);
+	ukey->exprs = NIL;
+	ukey->multi_nullvals = false;
+	rel->uniquekeys = list_make1(ukey);
+}
+
+static List *
+extract_const_opposite_exprs(List *restrict_list, List **opfamilies)
+{
+	ListCell	*lc;
+	List		*res = NIL;
+	foreach(lc, restrict_list)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
+		Node *leftop, *rightop;
+
+		if (rinfo->mergeopfamilies == NIL || rinfo->pseudoconstant)
+			continue;
+
+		leftop = get_leftop(rinfo->clause);
+		rightop = get_rightop(rinfo->clause);
+
+		if (leftop == NULL || rightop == NULL)
+			continue;
+
+		if (bms_is_empty(rinfo->left_relids) && !contain_volatile_functions(leftop))
+		{
+			res = lappend(res, rightop);
+			*opfamilies = lappend(*opfamilies, rinfo->mergeopfamilies);
+		}
+		else if (bms_is_empty(rinfo->right_relids) && !contain_volatile_functions(rightop))
+		{
+			res = lappend(res, leftop);
+			*opfamilies = lappend(*opfamilies, rinfo->mergeopfamilies);
+		}
+	}
+	return res;
+}
+
+
+
+static bool
+can_expr_be_referenced(RelOptInfo *rel, Expr *expr, Bitmapset *known_varattrs)
+{
+	if (IsA(expr, Var))
+		return bms_is_member(((Var*)expr)->varattno - FirstLowInvalidHeapAttributeNumber,
+							 known_varattrs);
+
+	return list_member(rel->reltarget->exprs, expr);
+}
+
+/*
+ * filter_useful_unique_exprs
+ *
+ *	Filter out any exprs which is impossible to be referenced later.
+ */
+static List *
+filter_useful_unique_exprs(RelOptInfo *rel, List *unique_exprs_list)
+{
+	List	*res = NIL;
+	ListCell	*lc;
+
+	Bitmapset *target_attrs = NULL;
+	foreach(lc, rel->reltarget->exprs)
+	{
+		Expr *expr = lfirst(lc);
+		if (IsA(expr, Var))
+		{
+			/* TODO: check RelOptInfo->attr_needed */
+			Var *var = (Var *)expr;
+			target_attrs = bms_add_member(target_attrs,
+										  var->varattno - FirstLowInvalidHeapAttributeNumber);
+		}
+	}
+
+	foreach(lc, unique_exprs_list)
+	{
+		List	*unique_exprs = lfirst_node(List, lc);
+		ListCell	*l;
+		bool	useful = true;
+		foreach(l, unique_exprs)
+		{
+			Expr *node = lfirst(l);
+			if (IsA(node, EquivalenceClass))
+			{
+				EquivalenceClass *ec = (EquivalenceClass *)node;
+				ListCell	*lc2;
+				bool any_matchs = false;
+				foreach(lc2, ec->ec_members)
+				{
+					Expr *expr = lfirst_node(EquivalenceMember, lc2)->em_expr;
+					if (can_expr_be_referenced(rel, expr, target_attrs))
+					{
+						any_matchs = true;
+						break;
+					}
+				}
+				if (!any_matchs)
+					useful = false;
+			}
+			else
+				useful = can_expr_be_referenced(rel, node, target_attrs);
+
+			if (!useful)
+				break;
+		}
+		if (useful)
+			res = lappend(res, unique_exprs);
+	}
+	return res;
+}
+
+
+static bool
+is_nullable_expr(Expr *expr, RelOptInfo *rel)
+{
+	if (IsA(expr, Var))
+	{
+		Var *var = (Var *)expr;
+		return !bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
+							  rel->notnullattrs);
+	}
+	/* Actually we can check more with baserestrictinfo, joininfo. */
+	return true;
+}
+
+
+/*
+ * make_uniquekey_from_exprs
+ *
+ *  translate expr to EquivalenceClass if possible before creating
+ * a real UniqueKey.
+ */
+static void
+make_uniquekey_from_exprs(RelOptInfo *rel,
+						  List *unique_exprs)
+{
+	ListCell	*lc;
+	bool	maynull = false;
+	foreach(lc, unique_exprs)
+	{
+		Expr *expr = lfirst(lc);
+		EquivalenceMember *em;
+
+		if (IsA(expr, EquivalenceClass))
+		{
+			EquivalenceClass *ec = (EquivalenceClass *)expr;
+			if (list_length(ec->ec_members) == 1)
+			{
+				/*
+				 * An non-1-member EC indicates it is not null.
+				 * Need double check with partitioned case.
+				 */
+				em = linitial_node(EquivalenceMember, ec->ec_members);
+				maynull = is_nullable_expr(em->em_expr, rel);
+			}
+		}
+		else
+			maynull = is_nullable_expr(expr, rel);
+
+		if (maynull)
+			break;
+	}
+	make_usual_uniquekey(rel, unique_exprs,  maynull);
+}
+
+/*
+ * any_ec_member_in_exprs
+ */
+static bool
+any_ec_member_in_exprs(EquivalenceClass *ec, List *exprs)
+{
+	ListCell	*lc;
+	foreach(lc, ec->ec_members)
+	{
+		if (list_member(exprs, lfirst_node(EquivalenceMember, lc)->em_expr))
+			return true;
+	}
+	return false;
+}
+
+/*
+ * Check if the exprs in UniqueKey are a subset of exprs
+ */
+static bool
+are_exprs_match_uniquekey(List *exprs, UniqueKey *ukey)
+{
+	ListCell	*lc;
+	foreach(lc, ukey->exprs)
+	{
+		Expr *expr = lfirst(lc);
+		if (IsA(expr, EquivalenceClass))
+		{
+			if (!any_ec_member_in_exprs((EquivalenceClass *)expr, exprs))
+				return false;
+		}
+		else
+			if (!list_member(exprs, expr))
+				return false;
+	}
+	return true;
+}
+
+
+bool
+relation_has_uniquekeys_for(List *exprs, RelOptInfo *rel, bool allow_multinulls)
+{
+	ListCell *lc;
+	if (rel->uniquekeys == NIL)
+		return false;
+
+	foreach(lc, rel->uniquekeys)
+	{
+		UniqueKey *ukey = lfirst_node(UniqueKey, lc);
+		if (!allow_multinulls && ukey->multi_nullvals)
+			continue;
+		if (are_exprs_match_uniquekey(exprs, ukey))
+			return true;
+	}
+	return false;
+}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index adf68d8790..dfc00eac0c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4749,6 +4749,11 @@ create_distinct_paths(PlannerInfo *root,
 	bool		allow_hash;
 	Path	   *path;
 	ListCell   *lc;
+	List		*distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
+														 parse->targetList);
+
+	if (relation_has_uniquekeys_for(distinctExprs, input_rel, false))
+		return input_rel;
 
 	/* For now, do all work in the (DISTINCT, NULL) upperrel */
 	distinct_rel = fetch_upper_rel(root, UPPERREL_DISTINCT, NULL);
@@ -4786,10 +4791,6 @@ create_distinct_paths(PlannerInfo *root,
 		/*
 		 * Otherwise, the UNIQUE filter has effects comparable to GROUP BY.
 		 */
-		List	   *distinctExprs;
-
-		distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-												parse->targetList);
 		numDistinctRows = estimate_num_groups(root, distinctExprs,
 											  cheapest_input_path->rows,
 											  NULL);
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 731ff708b9..4e9d22d80b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -257,6 +257,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	rel->all_partrels = NULL;
 	rel->partexprs = NULL;
 	rel->nullable_partexprs = NULL;
+	rel->uniquekeys = NIL;
 
 	/*
 	 * Pass assorted information down the inheritance hierarchy.
@@ -2023,3 +2024,26 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+void
+index_keys_walker(IndexOptInfo *indinfo, bool (*walker) (), void *context)
+{
+	int	i = 0;
+	Expr	*expr;
+	ListCell	*ind_expr_pos = list_head(indinfo->indexprs);
+	for (i = 0; i < indinfo->ncolumns; ++i)
+	{
+		int attr = indinfo->indexkeys[i];
+		if (attr > 0)
+			expr = list_nth_node(TargetEntry, indinfo->indextlist, i)->expr;
+		else if (attr == 0)
+		{
+			expr = lfirst(ind_expr_pos);
+			ind_expr_pos = lnext(indinfo->indexprs, ind_expr_pos);
+		}
+		else
+			Assert(false);
+		if (walker(expr, i, context))
+			return;
+	}
+}
diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index eb113d94c1..e25b0eae2b 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -446,6 +446,30 @@ extract_actual_clauses(List *restrictinfo_list,
 	return result;
 }
 
+
+List *
+extract_mergable_clauses(List *restrictinfo_list,
+						 bool pseudoconstant,
+						 List **opfamilies)
+{
+	List	   *result = NIL;
+	ListCell   *l;
+
+	foreach(l, restrictinfo_list)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->mergeopfamilies == NIL)
+			continue;
+		if (rinfo->pseudoconstant == pseudoconstant)
+		{
+			result = lappend(result, rinfo->clause);
+			*opfamilies = lappend(*opfamilies, rinfo->mergeopfamilies);
+		}
+	}
+	return result;
+}
+
 /*
  * extract_actual_join_clauses
  *
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 40ae489c23..18dc08f994 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -262,6 +262,7 @@ typedef enum NodeTag
 	T_EquivalenceMember,
 	T_PathKey,
 	T_PathTarget,
+	T_UniqueKey,
 	T_RestrictInfo,
 	T_IndexClause,
 	T_PlaceHolderVar,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2d81f6216d..9b795f2780 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -740,7 +740,7 @@ typedef struct RelOptInfo
 												 * paths? (if partitioned rel) */
 	Relids		top_parent_relids;	/* Relids of topmost parents (if "other"
 									 * rel) */
-
+	List		*uniquekeys;
 	/* used for partitioned relations: */
 	PartitionScheme part_scheme;	/* Partitioning scheme */
 	int			nparts;			/* Number of partitions; -1 if not yet set; in
@@ -1050,6 +1050,12 @@ typedef struct PathKey
 	bool		pk_nulls_first; /* do NULLs come before normal values? */
 } PathKey;
 
+typedef struct UniqueKey
+{
+	NodeTag		type;
+	List	   *exprs;
+	bool		multi_nullvals;
+} UniqueKey;
 
 /*
  * PathTarget
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 035d3e1206..40737c5ba5 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -136,6 +136,7 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
 												  Relids rel,
 												  bool create_it);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
+extern EquivalenceClass *find_ec_for_expr(PlannerInfo* root, Expr *expr, RelOptInfo *rel);
 extern Expr *find_em_expr_usable_for_sorting_rel(PlannerInfo *root,
 												 EquivalenceClass *ec,
 												 RelOptInfo *rel,
@@ -247,5 +248,14 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,
 									   int strategy, bool nulls_first);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 									List *live_childrels);
+extern void index_keys_walker(IndexOptInfo *indinfo,
+							  bool (*walker) (),
+							  void *context);
+
+extern void populate_baserel_uniquekeys(PlannerInfo *root,
+										RelOptInfo *rel,
+										List *indexlist);
+extern bool relation_has_uniquekeys_for(List *exprs, RelOptInfo *rel,
+										bool allow_multinulls);
 
 #endif							/* PATHS_H */
diff --git a/src/include/optimizer/restrictinfo.h b/src/include/optimizer/restrictinfo.h
index 0165ffde37..3d23ef630f 100644
--- a/src/include/optimizer/restrictinfo.h
+++ b/src/include/optimizer/restrictinfo.h
@@ -37,6 +37,9 @@ extern bool restriction_is_securely_promotable(RestrictInfo *restrictinfo,
 extern List *get_actual_clauses(List *restrictinfo_list);
 extern List *extract_actual_clauses(List *restrictinfo_list,
 									bool pseudoconstant);
+extern List *extract_mergable_clauses(List *restrictinfo_list,
+									  bool pseudoconstant,
+									  List **opfamilies);
 extern void extract_actual_join_clauses(List *restrictinfo_list,
 										Relids joinrelids,
 										List **joinquals,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 5c7528c029..525de82767 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4534,18 +4534,15 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
   on d.a = s.id;
-              QUERY PLAN              
---------------------------------------
+             QUERY PLAN             
+------------------------------------
  Merge Right Join
    Merge Cond: (b.id = d.a)
-   ->  Unique
-         ->  Sort
-               Sort Key: b.id, b.c_id
-               ->  Seq Scan on b
+   ->  Index Scan using b_pkey on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
-(9 rows)
+(6 rows)
 
 -- check join removal works when uniqueness of the join condition is enforced
 -- by a UNION
diff --git a/src/test/regress/expected/select_distinct.out b/src/test/regress/expected/select_distinct.out
index 11c6f50fbf..dba5b41a6f 100644
--- a/src/test/regress/expected/select_distinct.out
+++ b/src/test/regress/expected/select_distinct.out
@@ -306,3 +306,80 @@ SELECT null IS NOT DISTINCT FROM null as "yes";
  t
 (1 row)
 
+CREATE TABLE uqk1(a int, pk int primary key, b int,  c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, b int, c int,  d int);
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+ANALYZE uqk1;
+ANALYZE uqk2;
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+    QUERY PLAN    
+------------------
+ Seq Scan on uqk1
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+           QUERY PLAN            
+---------------------------------
+ HashAggregate
+   Group Key: c, d
+   ->  Seq Scan on uqk1
+         Filter: (c IS NOT NULL)
+(4 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL and d > 1;
+               QUERY PLAN                
+-----------------------------------------
+ Seq Scan on uqk1
+   Filter: ((c IS NOT NULL) AND (d > 1))
+(2 rows)
+
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+        QUERY PLAN         
+---------------------------
+ Seq Scan on uqk1
+   Filter: (c IS NOT NULL)
+(2 rows)
+
+-- Test reduce-const-exprs.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+    QUERY PLAN     
+-------------------
+ Seq Scan on uqk1
+   Filter: (c = 1)
+(2 rows)
+
+SET plan_cache_mode TO force_generic_plan ;
+prepare s as select distinct c from uqk1 where d = $1 and c is not null;
+explain (costs off) execute s(1);
+                QUERY PLAN                
+------------------------------------------
+ Seq Scan on uqk1
+   Filter: ((c IS NOT NULL) AND (d = $1))
+(2 rows)
+
+reset plan_cache_mode;
+-- Test onerow uniquekey.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = 1)
+(2 rows)
+
+-- Test EquivalenceClass
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = a;
+     QUERY PLAN     
+--------------------
+ Seq Scan on uqk1
+   Filter: (pk = a)
+(2 rows)
+
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, b FROM uqk1 WHERE a = c AND b = d;
+           QUERY PLAN            
+---------------------------------
+ Seq Scan on uqk1
+   Filter: ((a = c) AND (b = d))
+(2 rows)
+
diff --git a/src/test/regress/sql/select_distinct.sql b/src/test/regress/sql/select_distinct.sql
index 33102744eb..51340130a7 100644
--- a/src/test/regress/sql/select_distinct.sql
+++ b/src/test/regress/sql/select_distinct.sql
@@ -135,3 +135,31 @@ SELECT 1 IS NOT DISTINCT FROM 2 as "no";
 SELECT 2 IS NOT DISTINCT FROM 2 as "yes";
 SELECT 2 IS NOT DISTINCT FROM null as "no";
 SELECT null IS NOT DISTINCT FROM null as "yes";
+
+
+CREATE TABLE uqk1(a int, pk int primary key, b int,  c int,  d int);
+CREATE TABLE uqk2(a int, pk int primary key, b int, c int,  d int);
+CREATE UNIQUE INDEX uqk1_ukcd ON uqk1(c, d);
+ANALYZE uqk1;
+ANALYZE uqk2;
+
+-- Test single table
+EXPLAIN (COSTS OFF) SELECT DISTINCT * FROM uqk1;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL and d > 1;
+ALTER TABLE uqk1 ALTER COLUMN d SET NOT NULL;
+EXPLAIN (COSTS OFF) SELECT DISTINCT c, d FROM uqk1 WHERE c is NOT NULL;
+
+-- Test reduce-const-exprs.
+EXPLAIN (COSTS OFF) SELECT DISTINCT d FROM uqk1 WHERE c = 1;
+SET plan_cache_mode TO force_generic_plan ;
+prepare s as select distinct c from uqk1 where d = $1 and c is not null;
+explain (costs off) execute s(1);
+reset plan_cache_mode;
+
+-- Test onerow uniquekey.
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = 1;
+-- Test EquivalenceClass
+EXPLAIN (COSTS OFF) SELECT DISTINCT a FROM uqk1 WHERE pk = a;
+EXPLAIN (COSTS OFF) SELECT DISTINCT a, b FROM uqk1 WHERE a = c AND b = d;
+
-- 
2.21.0

Dmitry Dolgov

9erthalion6@gmail.com

almost 5 years ago

In reply to: Andy Fan (#1)

Re: UniqueKey on Partitioned table.

On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:

The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
single relation part and may have bugs. I just attached it here for design
review only. and the not-null-attrs is just v1 which we can continue
discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Dmitry Dolgov (#2)

Re: UniqueKey on Partitioned table.

On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:

The attached is a UnqiueKey with EquivalenceClass patch, I just complete

the

single relation part and may have bugs. I just attached it here for

design

review only. and the not-null-attrs is just v1 which we can continue
discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Yes, That's because I don't want to create a new EquivalenceClass (which
would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case. However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 5 years ago

In reply to: Andy Fan (#3)

Re: UniqueKey on Partitioned table.

On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:

The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
single relation part and may have bugs. I just attached it here for design
review only. and the not-null-attrs is just v1 which we can continue
discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Yes, That's because I don't want to create a new EquivalenceClass (which
would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case.
However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly.

Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

--
Best Wishes,
Ashutosh Bapat

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Ashutosh Bapat (#4)

Re: UniqueKey on Partitioned table.

On Tue, 30 Mar 2021 at 02:27, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Yes, That's because I don't want to create a new EquivalenceClass (which
would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case.
However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly.

Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

I doubt that it should be. get_eclass_for_sort_expr() makes
single-member ECs for sorts. I imagine the UniqueKey stuff should
copy that... However, get_eclass_for_sort_expr() can often dominate
the planning effort in queries to partitioned tables with a large
number of partitions when the query has an ORDER BY. Perhaps Andy is
trying to sidestep that issue?

I mentioned a few things in [1]/messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com on what I think about this.

David

[1]: /messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: David Rowley (#5)

Re: UniqueKey on Partitioned table.

On Tue, Mar 30, 2021 at 4:16 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 30 Mar 2021 at 02:27, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <zhihui.fan1213@gmail.com>

wrote:

On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <9erthalion6@gmail.com>

wrote:

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Yes, That's because I don't want to create a new EquivalenceClass

(which

would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case.
However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly.

Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

When working with UniqueKey, I do want to make PlannerInfo.eq_classes short,
so I don't want to create a new EC for UniqueKey only. After I realized we
have
so single-member ECs, I doubt if the "Expr in UniqueKey" will be executed
in real.
I still didn't get enough time to do more research about this.

I doubt that it should be. get_eclass_for_sort_expr() makes

single-member ECs for sorts.

Thanks for this hint. I can check more cases like this.

I imagine the UniqueKey stuff should
copy that... However, get_eclass_for_sort_expr() can often dominate

the planning effort in queries to partitioned tables with a large

number of partitions when the query has an ORDER BY. Perhaps Andy is
trying to sidestep that issue?

Yes. a long PlannerInfo.eq_classes may make some finding slow, and in
my UniqueKey patch, I am trying to not make it longer.

I mentioned a few things in [1] on what I think about this.

David

[1]
/messages/by-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@mail.gmail.com

I appreciate all of the people who helped on this patch and others. I would
like to share more of my planning. As for the UniqueKey patch, there are
some
design decisions that need to be made. In my mind, the order would be:
a). How to present the notnullattrs probably in [1]/messages/by-id/CAKU4AWp3WKyrMKNdg46BvQRD7xkNL9UsLLcLhd5ao=FSbnaN_Q@mail.gmail.com b). How to present
the element
in UniqueKey. Prue EquivalenceClasses or Mix of Expr and EquivalenceClass
as
we just talked about. c). How to maintain the UniqueKey Partitioned table
in the
beginning of this thread. As for a) & c). I have my current proposal for
discussion.
as for b) I think I need more thinking about this. Based on the idea
above, I am
not willing to move too fast on the following issue unless the previous
issue
can be addressed. Any feedback/suggestion about my current planning is
welcome.

[1]: /messages/by-id/CAKU4AWp3WKyrMKNdg46BvQRD7xkNL9UsLLcLhd5ao=FSbnaN_Q@mail.gmail.com
/messages/by-id/CAKU4AWp3WKyrMKNdg46BvQRD7xkNL9UsLLcLhd5ao=FSbnaN_Q@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Ashutosh Bapat

ashutosh.bapat.oss@gmail.com

almost 5 years ago

In reply to: Andy Fan (#6)

Re: UniqueKey on Partitioned table.

b). How to present the element
in UniqueKey. Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
we just talked about.

I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

--
Best Wishes,
Ashutosh Bapat

Andy Fan

zhihui.fan1213@gmail.com

almost 5 years ago

In reply to: Ashutosh Bapat (#7)

Re: UniqueKey on Partitioned table.

On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
wrote:

b). How to present the element
in UniqueKey. Prue EquivalenceClasses or Mix of Expr and

EquivalenceClass as

we just talked about.

I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

TBH, I haven't thought about this too hard, but I think when we build the
UniqueKey, all the ECs have been built already. So can you think out an
case we start with an EC with a single member at the beginning and
have more members later for UniqueKey cases?

--
Best Regards
Andy Fan (https://www.aliyun.com/)

David Rowley

dgrowleyml@gmail.com

almost 5 years ago

In reply to: Andy Fan (#8)

Re: UniqueKey on Partitioned table.

On Tue, 6 Apr 2021 at 22:31, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

TBH, I haven't thought about this too hard, but I think when we build the
UniqueKey, all the ECs have been built already. So can you think out an
case we start with an EC with a single member at the beginning and
have more members later for UniqueKey cases?

I don't really know if it matters which order things happen. We still
end up with a single EC containing {a,b} whether we process ORDER BY b
or WHERE a=b first.

In any case, the reason we want PathKeys to be ECs rather than just
Exprs is to allow cases such as the following to use an index to
perform the sort.

# create table ab (a int, b int);
# create index on ab(a);
# set enable_seqscan=0;
# explain select * from ab where a=b order by b;
QUERY PLAN
---------------------------------------------------------------------
Index Scan using ab_a_idx on ab (cost=0.15..83.70 rows=11 width=8)
Filter: (a = b)
(2 rows)

Of course, we couldn't use this index to provide pre-sorted results if
"where a=b" hadn't been there.

David

#10

Andy Fan

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: David Rowley (#9)

Re: UniqueKey on Partitioned table.

On Tue, Apr 6, 2021 at 6:55 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 6 Apr 2021 at 22:31, Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

TBH, I haven't thought about this too hard, but I think when we build the
UniqueKey, all the ECs have been built already. So can you think out an
case we start with an EC with a single member at the beginning and
have more members later for UniqueKey cases?

I don't really know if it matters which order things happen. We still
end up with a single EC containing {a,b} whether we process ORDER BY b
or WHERE a=b first.

I think it is time to talk about this again. Take the below query as example:

SELECT * FROM t1, t2 WHERE t1.pk = t2.pk;

Then when I populate_baserel_uniquekeys for t1, we already have
EC{Members={t1.pk, t2.pk}} in root->eq_classes already. Then I use
this EC directly for t1's UniqueKey. The result is:

T1's UniqueKey : [ EC{Members={t1.pk, t2.pk}} ].

*Would this be OK since at the baserel level, the "t1.pk = t2.pk" is not
executed yet?*

I tried the below example to test how PathKey is maintained.
CREATE TABLE t1 (a INT, b INT);
CREATE TABLE t2 (a INT, b INT);
CREATE INDEX ON t1(b);

SELECT * FROM t1, t2 WHERE t1.b = t2.b and t1.b > 3;

then we can get t1's Path:

Index Scan on (b), PathKey.pk_class include 2 members (t1.b, t2.b}
even before the Join.

So looks the answer for my question should be "yes"? Hope I have
made myself clear.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

#11

David Rowley

dgrowleyml@gmail.com

over 4 years ago

In reply to: Andy Fan (#10)

Re: UniqueKey on Partitioned table.

On Sat, 17 Jul 2021 at 19:32, Andy Fan <zhihui.fan1213@gmail.com> wrote:

SELECT * FROM t1, t2 WHERE t1.pk = t2.pk;

Then when I populate_baserel_uniquekeys for t1, we already have
EC{Members={t1.pk, t2.pk}} in root->eq_classes already. Then I use
this EC directly for t1's UniqueKey. The result is:

T1's UniqueKey : [ EC{Members={t1.pk, t2.pk}} ].

*Would this be OK since at the baserel level, the "t1.pk = t2.pk" is not
executed yet?*

I tried the below example to test how PathKey is maintained.
CREATE TABLE t1 (a INT, b INT);
CREATE TABLE t2 (a INT, b INT);
CREATE INDEX ON t1(b);

SELECT * FROM t1, t2 WHERE t1.b = t2.b and t1.b > 3;

then we can get t1's Path:

Index Scan on (b), PathKey.pk_class include 2 members (t1.b, t2.b}
even before the Join.

So looks the answer for my question should be "yes"? Hope I have
made myself clear.

I don't see the problem. The reason PathKeys use EquivalenceClasses is
so that queries like: SELECT * FROM tab WHERE a=b ORDER BY b; can see
that they're also ordered by a. This is useful because if there
happens to be an index on tab(a) then we can use it to provide the
required ordering for this query.

We'll want the same with UniqueKeys. The same thing there looks like:

CREATE TABLE tab (a int primary key, b int not null);

select distinct b from tab where a=b;

Since we have the EquivalenceClass with {a,b} stored in the UniqueKey,
then we should be able to execute this without doing any distinct
operation.

David

#12

Andy Fan

zhihui.fan1213@gmail.com

over 4 years ago

In reply to: David Rowley (#11)

Re: UniqueKey on Partitioned table.

On Sat, Jul 17, 2021 at 3:45 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Sat, 17 Jul 2021 at 19:32, Andy Fan <zhihui.fan1213@gmail.com> wrote:

SELECT * FROM t1, t2 WHERE t1.pk = t2.pk;

Then when I populate_baserel_uniquekeys for t1, we already have
EC{Members={t1.pk, t2.pk}} in root->eq_classes already. Then I use
this EC directly for t1's UniqueKey. The result is:

T1's UniqueKey : [ EC{Members={t1.pk, t2.pk}} ].

*Would this be OK since at the baserel level, the "t1.pk = t2.pk" is not
executed yet?*

I tried the below example to test how PathKey is maintained.
CREATE TABLE t1 (a INT, b INT);
CREATE TABLE t2 (a INT, b INT);
CREATE INDEX ON t1(b);

SELECT * FROM t1, t2 WHERE t1.b = t2.b and t1.b > 3;

then we can get t1's Path:

Index Scan on (b), PathKey.pk_class include 2 members (t1.b, t2.b}
even before the Join.

So looks the answer for my question should be "yes"? Hope I have
made myself clear.

I don't see the problem.

Thanks for the double check, that removes a big blocker for my development.
I'd submit a new patch very soon.

The reason PathKeys use EquivalenceClasses is
so that queries like: SELECT * FROM tab WHERE a=b ORDER BY b; can see
that they're also ordered by a. This is useful because if there
happens to be an index on tab(a) then we can use it to provide the
required ordering for this query.

We'll want the same with UniqueKeys. The same thing there looks like:

CREATE TABLE tab (a int primary key, b int not null);

select distinct b from tab where a=b;

Since we have the EquivalenceClass with {a,b} stored in the UniqueKey,
then we should be able to execute this without doing any distinct
operation.

David

--
Best Regards
Andy Fan (https://www.aliyun.com/)